UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Genetic analysis of huntington disease Andrew, Susan E. 1994

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1995-983795.pdf [ 5.39MB ]
Metadata
JSON: 831-1.0088862.json
JSON-LD: 831-1.0088862-ld.json
RDF/XML (Pretty): 831-1.0088862-rdf.xml
RDF/JSON: 831-1.0088862-rdf.json
Turtle: 831-1.0088862-turtle.txt
N-Triples: 831-1.0088862-rdf-ntriples.txt
Original Record: 831-1.0088862-source.json
Full Text
831-1.0088862-fulltext.txt
Citation
831-1.0088862.ris

Full Text

GENETIC ANALYSIS OFHUNTINGTON DISEASEbySUSAN E. ANDREWB.Sc., The University of Toronto, 1987M.Sc., Simon Fraser University, 1989A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE STUDIESGenetics ProgrammeWe accept this thesis as conformingto th required----THE UNIVERSITY OF BRITISH COLUMBIAApril 1994() Susan E. Andrew, 1994In presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.(Signature)Department of 6tJenc5 D,1M4EThe University of British ColumbiaVancouver, CanadaDate___________DE-6 (2)88)11ABSTRACTHuntington disease (lID) is an autosomal dominant neurodegenerative disease characterizedby progressive dementia and chorea. The initial aim of this thesis was to identify candidateregions for the HD gene. Markers separated by 3 Mb were found to be in strong allelicassociation with HI), thus identifying two mutually exclusive candidate regions.A screen for genomic rearrangements in affected individuals using exonic clones from theproximal candidate region was undertaken. One auspicious clone showed a genomicrearrangement, cosegregating with HI) in two families. Subsequent cloning andsequencing of this region demonstrated an Alu retrotransposition event associated withHuntington disease in the two families.During this work, the mutation causing Huntington disease was identified by the HDCollaborative Research Group as an expanded CAG trinucleotide repeat in a novel gene.Analysis of new polymorphic markers in the gene permitted a retrospective analysis oflinkage disequilibrium in a 300kb region harbouring the CAG repeat. Analysis of HI)haplotypes showed that multiple haplotypes underlie CAG expansion. Mean CÁO lengthon control chromosomes with haplotypes identical to those most frequently observed in HI)were significantly larger than CAG lengths on other control chromosomes, consistent withthe hypothesis that these chromosomes are a reservoir for further expansions, resulting inHi) chromosomes.The nature of the dynamic trinucleotide repeat mutation resolved several previouslyconfusing issues of HI). There is a highly significant correlation between the age of onsetof Huntington disease and CAG repeat length which accounts for approximately 50% ofthe variation in the age of onset. The instability of the repeat and the tendency to expand111further over the generations accounts for the occurrences of new mutations and observedanticipation. CAG expansion is the basis of the worldwide occurrence of HD and wasshown to be a sensitive and specific marker for inheritance of the ND gene. A smallproportion of clinically affected patients tested lacked CAG expansion, representingmisdiagnosis, sample mix-up, clerical error or possible genetic heterogeneity. CAGanalysis also allowed for the resolution of recombinant ND chromosomes that previouslyhad suggested the gene was located in the distal candidate region.ivTABLE OF CONTENTS PAGEABSTRACT iiTABLE OF CONTENTS ivLIST OF TABLES viiLIST OF FIGURES viiiLIST OF ABBREVIATIONS xACKNOWLEDGEMENTS xi1. HUNTINGTON DISEASE 11.1 INTRODUCTION 21.2 HISTORY OF HD 21.3 CLiNICAL FEATURES 31.4 GENETICS 31.4.1 Inheritance 31.4.2 Penetrance and expressivity 41.4.3 Anticipation 41.4.4 Juvenile onset 61.4.5 Epidemiology 61.4.6 Mutation rate 71.5 NEUROPATHOLOGY 71.6 POSITIONAL CLONING 81.6.1 Positional cloning 81.6.2 Chromosomal localization 81.6.3 Establishment of a candidate region 91.6.4 Identification of candidate genes 91.6.5 Mutation analysis 101.7 THE SEARCH FOR THE lID GENE 101.7.1 Mappingthegeneto4pl6.3 101.7.2 Establishing the candidate region 131.7.3 The HD mutation 161.8 OBJECTIVE 161.9 REFERENCES 182. MATERIALS AND METHODS 272.1 GENETIC ANALYSIS 282.2 DNA ISOLATION AND SOUTHERN BLOT 292.3 DNA PROBES 292.4 PREPARATION OF HYBRIDIZATION PROBES 292.5 PCR PRIMERS 292.6 SEQUENCING 322.6.1 Double strand sequencing 322.6.2 Single strand PCR products 322.7 PREPARATION OF cDNA TEMPLATE 322.8 STATISTICAL ANALYSIS OF ASSOCIATION 322.9 STATISTICAL ANALYSIS OF CAG ANALYSIS 342.10 REFERENCES 35VGENETIC ANALYSIS3. NONRANDOM All El JC ASSOCIATION 373.1 INTRODUCTION 383.1.1 Allelic association 383.1.2 Allelic association in the Hi) candidate region 403.2 RESULTS 443.2.1 Identification of new distal polymorphic markers 443.2.2 Nonrandom allelic association across 6 Mb 463.2.3 Haplotype analysis 503.2.4 Analysis of homogeneous populations 533.3 DISCUSSION 583.4 REFERENCES 654. DNA ANALYSIS OF DISTINCT POPULATIONS 694.1 INTRODUCTION 704.2 RESULTS 714.2.1 Assessment of nonrandom association 714.2.2 DNA haplotype analysis of affected chromosomes 774.2.3 DNA haplotype analysis of control chromosomes 814.2.4 Haplotype comparisons 814.3 DISCUSSION 834.4 REFERENCES 885. PATI’ERNS OF ASSOCIATION AROUND THE HD GENE 905.1 INTRODUCTION 915.2 RESULTS 935.2.1 DNA markers 935.2.2 Statistical analysis 965.2.3 Gene frequencies and allelic association 965.2.4 Haplotype analysis 995.2.5 Comparison of CAG length between haplotypes 1015.2.6 Haplotypes of sporadic lID patients 1015.3 DISCUSSION 1045.4 REFERENCES 109SEARCH FOR GENOMIC REARRANGEMENTS6. IDENTIFICATION OF AN ALU RETROTRANSPOSITION EVENT 1126.1 INTRODUCTION 1136.1.1 Gene Tracking 1136.1.2 GT clone analysis 1176.2 RESULTS 1206.2.1 GT48 genoniic rearrangement 1206.2.2 Genomic cloning of Alu Retrotransposition event 1206.3 DISCUSSION 1336.4 REFERENCES 140MOLECULAR GENETIC ANALYSIS OF HUNTINGTON DISEASE7. CAG EXPANSION IN HUNTINGTON DISEASE 1427.1 INTRODUCTION 143vi7.2 RESULTS 1447.2.1 Development of a PCR assay 1447.2.2 Association between CAG length and age of onset 1487.2.3 Correlation between clinical features and CAG length 1547.2.4 Variation in CAG repeat length in juvenile onset 1547.2.5 Predictive value of CAG repeat length 1557.2.6 Precision of CAG repeat assessment 15572.7 Parent-child correlations 1577.2.8 Sib-sib correlations 1587.3 DISCUSSION 1617.4 REFERENCES 1648. CAG SENSITW1TY AND SPECIFICITY 1678.1 INTRODUCTION 1688.2 RESULTS 1688.2.1 CAG repeat sizes in ND and other disorders 1718.2.2 CAG repeat sizes in control chromosomes 1748.3 DISCUSSION 1788.4 REFERENCES 1819. HUNTINGTON DISEASE WITHOUT CAG EXPANSION 1839.1 INTRODUCTION 1849.2 RESULTS 1859.2.1 Errors in assignment 1859.2.2 Misdiagnosis 1899.2.3 Absence of CAG expansion in HI) families 1899.2.4 Genetic heterogeneity of HD? 1919.3 DISCUSSION 1979.4 REFERENCES 20210. NEW MUTATIONS FOR HUNTINGTON DISEASE 20410.1 INTRODUCTION 20510.2 RESULTS 20610.2.1 Identification of a premutation 20610.2.2 Parental sex of origin 21410.2.3 New mutations without CAG repeat expansion 21510.2.4 Sporadic ND is transmitted to offspring 21510.3 DISCUSSION 21610.4 REFERENCES 22011. A CCG REPEAT ADJACENT TO ThE CAG REPEAT 22211.1 INTRODUCTION 22311.2 RESULTS 22311.3 DISCUSSION 22511.4 REFERENCES 23012. DISCUSSION 23112.1 SUMMARY OF RESULTS 23212.1.1 Allelic association 23212.1.2 Allelic association around the CAG repeat 23312.1.3 Genomic rearrangement associated with HI) 23412.1.4 CAG repeat analysis 23412.3 FURTHER INVESTIGATIONS 23612.4 REFERENCES 237viiLIST OF TABLES PAGE2-1 List of probes used 303-1 Summary of previous association studies 423-2 New polymorphisms 453-3 List of probes used in association analysis 473-4 Allele frequencies for markers from the HI) candidate region 483-5 Methods for determining association with multiple alleles 513-6 Haplotypes of HI) and control chromosomes 523-7 Haplotypes between markers in association with HI) 543-8 Allele frequencies for RFLPs on ND and canonical chromosomes 563-9 Allele frequencies for RFLPs in a UK population 594-1 Polymorphic markers 734-2 Allele frequencies on HI) and control chromosomes 744-3 ND haplotypes 784-4 Control haplotypes 825-1 Polymorphic markers used in analysis 955-2a Allele frequencies on HI) and control chromosomes 975-2b Allele frequencies of CAG repeat on ND and control chromosomes 985-3a Haplotypes of ND and control chromosomes 1005-3b CAG lengths of ND and control haplotypes 1025-4 ND Haplotypes from sporadic cases of ND 1036-1 SummaryofGTclones 1186-2 Allele frequencies of 1.2kb Hindu polymorphism 1347-1 Demographics of cohort 1497-2 CAG length 1507-3 CAG lengths by sex of parent/grandparent 1538-1 Distribution of ND allele sizes by origin 1698-2 Distribution of CAG lengths in other neuropsychiatric disorders 1738-3 CAG size distribution for control chromosomes by origin 1769-1 Reaons for lack of CAG expansion 1869-2 Possible phenocopies 1879-3 Errors in assignment 18810-la New mutations with Intermediate sized alleles 20710-lb New mutations without lAs 20811-1 Frequency of CCG alleles 226viiiLIST OF FIGURES PAGE1-1 F1D pedigree 51-2 Diagram of 4pl&3 showing markers 121-3 Candidate regions for HD 142-1 Schematic map of markers from 4p 16.3 313-1 Map of candidate region showing location of probes 434-1 Schematic diagram of HD haplotypes 795-1 Map of markers flanking the ND gene 946-1 Gene tracking methodology 1146-2 Mapping of cDNAs to BINs 1166-3 Southern blot of rearrangement seen in 2ND familiesa)FamilyA 121b) Family B 1226-4 a) Hapiotype of Family A with genomic rearrangment 123b) Haplotype of Family B with genomic rearrangment 1246-5 a) Localization of GT48 125b) Restriction map of )GT48 127c) Mapping of GT44 demonstrating Hindu polymorphism 128c) Restriction digests of genomic rearrangement 1296-6 a) PCR assay of Alu insertion 130b) Sequence ofAlu element 1316-7 Pedigrees of families A and B showing CAG repeat sizes 1356-8 Mapping of GT48 and the Huntington disease gene 1367-1 Primers for amplification of the CAG repeat 1457-2 Autoradiograph of the PCR products across the CAG repeat 1477-3 Relationship of CAG repeat length and age of onset 1527-4 Confidence intervals for predicted age of onset 1567-5 Parent-child corelations of CAG repeat length 1597-6 Sib-sib correlations of CAG repeat length 1608-1 a) CAG repeat lengths of 995 ND patients 172b) CAG repeat lengths of 600 control chromomosomes 175c) CAG repeat lengths of 995 normal chromosomes from ND patients 1779-1 Phenocopy pedigrees 13-19 1909-2 a) Pedigree of phenocopy patients 1 and 2 192b) Pedigree of phenocopy patients 3 and 4 193c) Pedigree of phenocopy patient 5 194d) Pedigree of phenocopy patient 6 195a) Pedigrees of phenocopy patients 7-10 19610-1 Autoradiographs of families demonstrating intermediate sized allelesa)Familyl 209b)Family2 210c)Family3 211d)Family4 212e)Family5 213ixLIST OF FIGURES CON’T. PAGE11-1 Schematic of CAG and CCG repeats and position of PCR primers 22411-2 Autoradiograph showing CCG polymorphism 227xLIST OF ABBREVIATIONSAO age of onsetbp basepairCF cystic fibrosiscM centiMorganDM myotonic dystrophyDRPLA dentato-rubro-paallido-luysian atrophyFRAXA fragile XFRAXE fragile XE mental retaniationH]) Huntington diseasekb kilobaseMb MegabaseRFLP restriction fragment length polymorphismSBMA spinal bulbar muscular atrophySCA spinocerebeilar ataxiaAll pedigrees presented in this thesis have been altered to protect the patients.xiACKNOWLEDGEMENTSIt was a privilege to work with Dr. Michael Hayden and I am grateful for the guidance,encouragement and support which he consistently provided. I would also like to thankmembers of my supervisory committee, Dr. Diana Juriloff, Dr. Peter Candido, Dr. LorneClark, and Dr. Paul Goodfellow for all their advice during my training.I would like to thank the patients with Huntington disease and their families, for withouttheir support and participation, this work could not have been done.There are many friends and colleagues that contributed greatly to this thesis, and I wouldlike to thank them for their support and friendship. I would especially like to thank Dr.Paul Goldberg for all his help and total support both in and outside of the lab. Specialthanks to Jane Theilmann who has been a part of this thesis from beginning to end. Thankyou to Dr. Johanna Rommens, Dr. Berry Kremer, Dr. Hakan Telenius, Dr. BernhardWeber, Dr. Coffin Collins, Dr. Gordon Hutchinson, Dr. Nando Squitieri, Jutta Zeisler,and Amy Hedrick who all contributed to this thesis. And thank you to other friends in theHayden lab for making the past four years so enjoyable.Enormous thanks to my parents John and Catherine, and John S., always, for their loveand constant support.This thesis was supported in part by scholarships from the Huntington Society of Canadaand UBC.1CHAPTER 1HUNTINGTON DISEASE21.1 INTRODUCTIONApproximately 5% of the population suffer from known genetic diseases1,and if diseaseswith a suspected genetic component are included, such as schizophrenia, diabetes andcoronary heart disease, at least 60% of the population suffer from disease with somegenetic basis2. At present, over 6000 disease genes have been mapped to a specificchromosomal region, and more than 200 genes with mutations underlying disease havebeen identified3.Haldane clearly stated the newly realized goals of human geneticists in1948:“The final aim...should be the enumeration and location of all thegenes found in normal human beings, the function of each being deducedfrom the variations occurring when the said gene is altered by mutation,or when several allelomorphs of it exist in normal men and women...”4One of the more recent mutations to be identified is the mutation associated withHuntington disease (ND), described as “one of the most dreadful diseases that man is liableto”5.1.2 HISTORY OF HDHistorically, the clinical features of ND have been recorded as early as the 1500s, whenParacelsus went against the popular belief of his time and stated that the dancing mania wasphysiological and not due to supernatural causes. Paracelsus introduced the term chorea,from the Latin choreus, or dancing, and the Greek choros meaning chorus6. In 1872, inhis paper “On Chorea”, Dr. George Huntington so clearly and completely described thisform of chorea, that the disease now carries his name7. As the presence and degree ofchorea is variable and is only one feature, albeit characteristic, of this disorder, it is nowmore commonly referred to as Huntington disease, rather than Huntington’s chorea.31.3 CLINICAL FEATURESThe onset of HD is insidious, characterized by involuntary choreic movements anddementia and other psychiatric disturbances. While the age of onset is variable rangingfrom 2 to 90 years the first symptoms appear on average by the age of 388. Cognitiveimpairment precedes any movement disorder in over half of the patients afificted with HI)9.Often, subtle mental changes such as irritability, depression, and impulsiveness precede thediagnosis and mild incoordination or jerkiness are often the first neurological symptoms6.The severity of the chorea and cognitive impairment worsen however as the diseaseprogresses. Dysarthria, presenting as a slowness and disorganization of speech, anddifficulty in swallowing, are common6.HD is ineluctable: progressive worsening of clinical features results in death, usually 15-25years after onset of symptoms, from heart attack, complications from aspiration(bronchopneumonia or choking), hematomas, or suicide9. Post-mortem examination of thebrain, showing caudate atrophy provides definitive diagnosis of RD. Positive familyhistory is therefore an important consideration for the diagnosis of ND.Choreic movements of the arms and legs, and disturbance of gait, are the most strikingfeatures of ND. However, five percent of patients, often those with juvenile onset, do notdevelop chorea but show a progressive slowing of movements towards a rigid state6.1.4 GENETICS1.4.1 INHERITANCEHI) is inherited as an autosomal dominant disorder which because of adult onset results inthe usual onset of symptoms after transmission of the gene to offspring. With no knowncure, this disease is not only devastating for the patient, but also places each of the patient’soffspring at 50% risk of inheriting a similar destiny.4Matings of two affected individuals are rare, however, two well documented individualshave been shown to be homozygous for the mutation associated with HD’°”.Interestingly, they do not differ in their clinical manifestation from those who haveinherited one H]) chromosome. Thus, lID represents a true autosomal dominant disorder,with homozygotes demonstrating a similar phenotype to heterozygotes. Although there aremany well documented case of complete dominant disorders in other organisms, such asDrosophila, in humans, of the 3600 or so autosomal dominant disorders known in man3,HD is the only one thought to be completely dominant9.1.4.2 PENETRANCE AND EXPRESSIVITYH]) demonstrates complete penetrance but displays variable expressivity. The widespectrum of phenotype does not suggest genetic heterogeneity however, as patients withinone family can demonstrate a wide range of phenotypes. Intrafamilial variability can beconsiderable as demonstrated by a family with ages of onset ranging from 20 to 50 in onegeneration and in the most recent generation, H]) was inherited in one individual at age 24(Figure 1-1).1.4.3 ANTICIPATIONAnticipation is a term used to describe the phenomenon of disease presentation oversuccessive generations with progressively earlier ages of onset, or with increasingseverity12. This has been observed in several genetic disorders, such as myotonicdystrophy (DM)13,fragile X syndrome (FRAXA)’4,bipolar affective disorder15 and HI).Anticipation was often discounted as a bias of ascertainment16.However, the recentcloning of the genes for DM17-’9,FRAXA20-2,and now HD demonstrate a molecularmechanism for this phenomenon which will be discussed further in Chapter 7.5El.Ao=2osjAo=5osAo>5oo AO=3O’EtlAO=24Figure 1-1. Pedigree affected with Huntington disease, demonstratingautosomal dominant inheritance, variable age of onset and anticipation.AO = age of onset.AO=45 AO= AO=60é61.4.4 JUVENILE ONSETIn approximately 6% of HD patients the age of onset is before age 20, and is termedjuvenile onset FID6. This group can be further divided into those who have childhoodonset before the age of 10, and those who present with adolescent onset, from 10 to 20years of age. Juvenile HI) is often characterized by rigidity and tremor, and epilepticseizures occur in approximately 30% to 50% of affected cases6. The rate of progression ofjuvenile ND is much faster than the later onset cases (duration of disease is 9 years onaverage for juveniles as compared to 14.65 years for adult onset cases)6.In deviation from Mendelian inheritance, the majority of juvenile cases have inherited thedefective gene from their father (64% with onset from age 10-20, 100% with onset beforeage 10)23. This is consistent with observations in some non-juvenile ND families ofanticipation where the gene is also inherited through the paternal Iine’.1.4.5 EPIDEMIOLOGYRD has a prevalence of 4-10/100,000 in Northern Europe6. The disease is seen in allmajor racial groups, although with different frequencies. Several areas demonstrate a highprevalence of the disease gene, thought to be due to a founder effect. For example, at LakeMaracaibo in Venezuela there is a prevalence of 7000 per million, from one founderindividual6.Also, many Afñkaner patients of South Africa can trace their ancestry back to1652, when an affected Dutchman arrived in Cape Town6. Geneological investigations inother countries such as Venezuela, Australia, Canada and the United States show northwestern Europe as the source of many ND chromosomes6.Several populations, such asthe Japanese and American and African blacks, have a very low prevalence of the ND genein their populations (3.8, 15 and .6 per million respectively)6.71.4.6 MUTATION RATESeveral studies have determined the mutation rate for HD, and it was accepted that themutation rate for HI) was the lowest of all known genetic disorders. Mutation rates rangefrom 0.13 mutants/106gametes to 9.6 mutants/10 gametes6.One difficulty in determiningthe rate of mutation stems from the difficulty in confirming new mutations. Stevens andParsonage defined criteria for new mutations as: the parents must be free of all symptomsof disease and have lived at least 70 years, paternity must be proven, and the disease mustthen be seen to be passed on to the proband’s offspring26. However, these strictrequirements meant that few confirmed cases were reported that fit all the criteria.Therefore, the mutation rate was, in most likelihood, too low an estimate. Theidentification of the HD mutation has allowed for direct confirmation of the number of newmutations arising from unaffected parents. Analysis of potential new mutation individualsthat did not fit all of the above criteria confirms that the number of new mutations waspreviously underestimated.1.5 NEUROPATHOLOGYThe biochemical basis for HI) is unknown. Pathologically, HD is characterized byselective cell death in the basal ganglia, primarily within the striatum (caudate nucleus andputamen)27’8.Neuronal degeneration occurs with regional progression, starting in the tailof the caudate29. The striatum is divided into distinguishable regions called patches andmatrix and it is the cells in the matrix which die first, as the cell death progresses throughthe striatum30’1.Certain specific cell types are more prone to cell death than others and itis the medium-sized spiny neurons, the largest proportion of striatal neurons, that are mostaffected32. Although the neuronal loss in the basal ganglia is the most characteristicpathological feature of HD, as the disease progresses there is also cell loss in the cortex,external segment of the globus pallidus, and eventually in the hypothalamus, however, onenotable exception is the cerebellum9.81.6 POSITIONAL CLONING1.6.1 POSITIONAL CLONINGPositional cloning refers to the isolation of a gene on the basis of its map position alone,with no knowledge of the defective gene product33. This approach has been successful inthe identification of genes for several human disorders including chronic granulomatousdisease34,Duchenne muscular dystrophy35,retinoblastoma36,and cystic fibrosis37.Localization of a gene originates with initial identification of the chromosomal regionassociated with the disease, subsequent refinement of the candidate region, identification ofcandidate genes within the region of interest, and assessment of these genes for potentialdisease causing mutations.Each gene search is different and characteristics of each particular disease and chromosomallocation will dictate the appropriate technique, or combination of techniques to be used toidentify the gene and ascertain the disease-causing mutations.1.6.2 CHROMOSOMAL LOCALIZATIONIn order to locate a disease gene, linkage between disease and a particular chromosomalmarker must be found. To search for linkage, families harbouring the disease gene arescreened with a series of polymorphic markers spanning the genome, and the likelihoodratio of linkage between the disease and each marker is determined. A logarithm of theodds ratio, or led score, of 3 means the odds of linkage at a particular recombinationdistance e are 1000:1 and is accepted as evidence of linkage, whereas a led score of -2 isindicative of non-linkage38’9.Localization of the marker to a specific chromosomal regionby in situ hybridization or somatic cell mapping can allow for further refinement of thedisease gene localization by multipoint linkage analysis with additional adjacent markers91.6.3 ESTABLISHMENT OF CANDIDATE REGIONFurther mapping can refine the candidate region by making use of gross chromosomalrearrangements. Cytogenetically observed rearrangements have been critical in isolation ofgenes such as Duchenne muscular dystrophy, Wilm’s tumour, and neurofibromatosis41.Recombination events can also be invaluable in defining the candidate region. However,the extent to which the candidate region can be narrowed in this manner is limited, usuallywith a range of 1-2 Mb, depending on the number of recombinant individuals, the numberof informative markers, and the presence, if any, of DNA rearrangements.Nonrandom allelic association between a marker and disease is another method of defininga candidate region for a disease gene exemplified by the localization of the cystic fibrosisgene42. Allelic association studies involve comparing allele frequencies in a group ofaffected patients to those of a normal control population. It is based on the principle thatloci located closest to the site of mutation undergo recombination less frequently than thoseat a greater distance, and therefore are likely to exhibit a higher degree of nonrandom allelicassociation or linkage disequilibrium. An increase in the measure of association betweendisease and markers along the chromosome is theoretically indicative of increasingproximity to the disease gene. Allelic association analysis assumes that new mutations arerare, and that affected chromosomes are derived from one or few ancestral mutation events.1.6.4 IDENTIFICATION OF CANDIDATE GENESEstablishment of the minimal candidate region allows for the construction of a physical mapand cloning of the defined region of interest. This has been made easier by the ability toclone large regions of DNA in YACs43 and P1 vectors” as well as in cosmids45 andphage.10Several strategies are now established for identifying all the transcribed sequences presentwithin the region of interest. Previous methods of finding transcribed sequences focusedon the identification of CpG islands, conservation of DNA sequences on zoo blots,Northern blot analysis and classical screening of eDNA libraries. These have beensuperseded by techniques such as hybrid selection techniques4749,exon trapping50’andcomputational approaches52,3that allow for a more rapid analysis of larger regions of thegenome by focusing on direct selection of transcribed sequences.1.6.5 MUTATION ANALYSISFurther analysis is necessary to evaluate candidate genes for alterations that could becausative for a particular disorder. Demonstration of an alteration cosegregating withdisease together with a negative finding in control individuals suggests the alteration maybe responsible for the disease phenotype. Many rearrangements associated with disease aredetected by Southern blot analysis. However, recently a variety of techniques have beendeveloped for characterizing intragenic changes, including RNase A protection assayM,denaturing gradient gel electrophoresis55,chemical mismatch cleavage56,and single-strandconformational polymorphism (SSCP) analysis57. A definitive method to detect mutationsis comparison of sequenced PCR products between patients and nonnals58.1.7 THE SEARCH FOR THE HUNTINGTON DISEASE GENE1.7.1 MAPPING THE GENE TO 4p16.3In the years following Huntington’s classic paper, progress towards a better understandingof the disease was limited, however, the development of molecular techniques initiated thesearch for the gene by a positional cloning approach.Unlike other genetic disorders characterized by gross chromosome rearrangements, such asthe large deletions and translocations on the X chromosome that aided isolation of the11Duchenne muscular dystrophy gene59, no outstanding feature indicated a chromosomallocation for the HI) gene. In 1983, the use of polymorphic markers (RFLPs) to determinelinkage of disease to specific chromosomes6°was successfully illustrated by the discoveryof linkage between H]) and the 4th marker to be tested, named G8, subsequently mappedto 4pl6.361. HI) was shown to cosegregate with G8 (D4S 10) in a large Venezuelanpedigree and an American family of German origin61. Multipoint linkage analysis usingboth families gave a combined lod score of 8.53 at 0=0, with a 99% confidence intervalof 10cM. Although the idea of a linkage map based on frequency of recombination wasfirst suggested in 1913 by Sturtevant62,the HI) gene was the first human disease gene tobe linked to an autosomal chromosome by a DNA marker without any previous informationas to its chromosomal location.The localization of G8 was further refmed to the tip of the short arm of chromosome 4 byin situ hybridizationó3Mand somatic cell hybrid panels65. Multipoint linkage analysiswith centromeric markers showed RD was more tightly linked to D4S1O, therefore placingthe disease gene between D4S 10 and the telomere, a region estimated to contain 6Mb ofDNA66 (Figure 1-2).Examination of 63 HI) families for linkage between ND and G8 gave a maximum lod scoreof 87.69 at a 0 of 0.0467. The 4% recombination between G8 and RD gave only anapproximate location for the HD gene, because of the unknown relationship betweenrecombination and physical distance in this telomeric region. The combined data fromseveral groups found no evidence for locus heterogeneity after examination of severalpopulations6772,but a second locus not linked to 4p and causing the HI) phenotype insome isolated families could not be completely excluded.4pLUD4S1OD4S180D4S95D4S98D4S168D4S115D4S111D4S228D4S90VVVVVVVVVIIAAAAAAD4S126D4S127D4S182D4S43D4S96D4S133IIIIIFigure1-2.Schematicmapof4p16.3showingrelativelocationsofmarkers.Compiledfromreferences79and92.%%%%%%%%‘%S%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%SS%%%%%S%%%%%%1000kb13The serendipitous finding of linkage with G8 (D4S 10) allowed for identification of otherpolymorphic markers closely linked to H]), such as D4S9573, that permittedpresymptomatic diagnosis for families and was the first step towards cloning of the gene.1.7.2 ESTABLISHING THE CANDIDATE REGIONAnalysis of families with recombination events between the H]) gene and linked DNAmarkers has resulted in mutually exclusive candidate regions for the HI) gene74’5 (Figure1-3). Many recombinant breakpoints were shown to occur at a hot spot for recombination,lying centromeric to D4S 1076-78 thus reducing the number of recombinants withbreakpoints telomeric to D4S 10 in the HD candidate region that contributed to therefinement of the candidate region.Several recombination events placed the HI) gene proximal to D4S 111 in a 2.2 Mb regionbetween D4S 10 and D4S987-82. In contrast, other recombinant lID chromosomessuggested a distal location for the HD gene, either distal to D4S 11176,83, distal toD4S22775 or within a 325 kb region telomeric to a more distal segment D4S9084. Thepossibility of an extreme telomeric site for the gene, distal to D4S90, led to the cloning ofthis 300 kb telomeric region85,and subsequently, several lines of evidence resulted in theexclusion of the telomere. For example, new markers distal to D4S90 indicated that onerecombinant chromosome previously thought to recombine distal to D4S90 maintained theunaffected haplotype throughout the telomeric region86. In addition, the lack of genes atthe telomere, and the lack of DNA rearrangements associated with HI) in this telomericregion87,as well as the discovery that the telomeric sequence of 4p is homologous to othertelomeric regions88 excluded a telomeric location for the HI) gene.However, despite exclusion of the telomere, two mutually exclusive candidate regions stillexisted. One important recombinant HI) chromosome added reason not to exclude theD4S1OD4S125D4S95D4S98D4S96D4S111D4S228D4S90III_(FIII1MbDiagrammaticrepresentationofthecandidateregionson4pl6.13forH])candidateregiondeterminedbyrecombinantchromosomes(references74-76,79-84,86,89)IlocalizationofrecombinationbreakpointinasporadiccaseofH]).(references75,89)Figure1-3.Candidateregionson4pl&3forHuntingtondiseasein1990.Compiledfromreferences74-89.15distal candidate region in the search for the HD gene75’89. This recombinant individualwas from a large family of German origin, and developed chorea and psychiatric problemsat age 36. Both his parents were healthy at ages greater than 70 years. There was nohistory of neurologic disorders in the family, and all 11 sibs of this patient were healthy.Analysis of the chromosomal haplotypes in this family revealed a recombination event hadoccurred in the proband in the distal region of 4pl6.3, between D4S 111 and D4S 141. Itwas hypothesized that the recombination event in the parental meiosis and the new mutationcausing ND were unlikely to be due to chance alone, due to the assumed low mutation rateand the decreased rate of recombination at the telomere, and therefore the recombinationevent itself was hypothesized to underlie the cause of ND in at least this patient.The mutually exclusive candidate regions led to the development of genetic and physicalmaps of both candidate regions and the cloning of each region in YAC and cosmidclones76’839092.Sequencing of a cosmid contig around D4S98 hinted that this is a generich region, with a gene encoded every 20 kb93. In addition, the identification of manygenes from 4pl6.3 as the search for the elusive ND gene was underway supported thisobservation. For example, the f3-subunit of phosphodiesterase and the myosin light chaingene were both identified in the distal candidate region94’5 and excluded as candidategenes for ND96.The development of improved methods to isolate coding sequences resulted in theidentification of additional coding sequences from the proximal candidate region. Onestrategy, outlined in Chapter 6 demonstrates one approach that generated 53 codingsequences, that represent at least 9 transcription units, two of which were subsequentlyidentified as sequence coding for the ND gene97.161.7.3 THE HUNTINGTON DISEASE MUTATIONThe mutation associated with HI) was identified by the HI) Collaborative Research Groupin March 199398. A polymorphic trinucleotide repeat (CAG) in the 5’ end of a novel genelocated between D4S 180 and D4S 127 expands beyond the normal range of 10-36 repeatsto up to over 100 repeats in patients with HI)99. Two alternatively polyadenylatedtranscripts of 10.3 kb and 13.7 kb are derived from the gene associated with the CAGrepeat98100. The gene codes for a predicted 348 kd protein that is widely expressed buthas no known function98.HI) is the seventh disease now known to be caused by a dynamic mutation, includingfragile X syndrome (FRAXA)’°”°2,fragile XE mental retardation (FRAXE)103,myotonic dystrophy (DM)’04106, spinal bulbar muscular dystrophy (SBMA)107,spinocerebellar ataxia (S 108, and dentato-rubro-pallido-luysian atrophy(DRPLA)’09’1101.8 OBJECTIVEThe objective of this thesis was to further refine the candidate region for the Hi) genewithin the 6 Mb candidate region present at the start of this work. Allelic association wasused to delineate the most likely candidate region. Two regions with markers in allelicassociation with lID were identified, separated by 3Mb. Haplotype analysis ofhomogeneous populations suggested that multiple haplotypes underlie Hi), even within ahomogeneous population of affected individuals.A search for genomic rearrangements was undertaken by screening patients with exomcclones mapping to the proximal candidate region, close to the marker in strong linkagedisequilibrium with Hi). One auspicious clone identified a genomic rearrangementcosegregating with HI) in two families, and subsequent mapping, cloning and sequencing17identified an Alu retrotransposition event associated with HI) in two families representing apossible cause for disease in these two families.The mutation associated with HD was identified by the H]) Collaborative Group during thecourse of this thesis. Identification of the mutation thus redirected the aim of this thesistowards understanding of the nature of the dynamic mutation. Towards this objective,genetic analysis of the CAG repeat in HI) families resolved several previously confusingclinical features of H]) and resolved the confusion of mutually exclusive candidate regionssuggested by recombinant individuals. In addition, the cloning of the gene allowed for aretrospective analysis of the positional cloning approaches utilized and an assessment of theuse of linkage disequilibrium in gene mapping. Towards this end, new polymorphicmarkers in the gene, in addition to previously used neighbouring markers, permitted aretrospective analysis of linkage disequilibrium in this region and construction ofhaplotypes provided unique insights into the history of the gene for HD.The work presented in this thesis could not possibly have been done by one person alone.I have been fortunate to receive the help of several colleagues, especially Jane Theilmannand Amy Hedrick in generating sufficient data for linkage disequilibrium analyses. Muchof the work in the Molecular Analysis section of this thesis involved teamwork and cooperation amongst members of the laboratory. My contribution is discussed in the Resultssections of each chapter, with work done by colleagues discussed in the Introduction onlyto provide the setting for which my work was done. Analysis of the CAG repeat alsorequired a team effort by Dr. Paul Goldberg, Jane Theilmann, Jutta Zeisler and Dr. NandoSquitieri for generation of such a large amount of data.181.9 REFERENCES1. Thompson MW, Mclnnes RR, Willard HF (1991). Genetics in Medicine 5th ed. WBSaunders Co, Philadelphia, pp 10.2. King RA, Rotter fl, Motuisky AG (1992). The Genetic Basis of Common Diseases.Oxford University Press, New York, pp4.3. McKusick VA (1989). Mendelian Inheritance in Man, 9th edition. Johns HopkinsUniversity Press, Baltimore.4. Haldane JBS (1948). The formal genetics of man. Proc Roy Soc London 135:147-170.5. Davenport CB and Muncey EB (1916). Huntington’s chorea in relation to heredity andeugenics. Am 3 Insan 73:195-222.6. Hayden MR (1981). Huntington’s chorea. Springer-Verlag, New York.7. Huntington G (1872). On chorea. Med Surg Rep 26:317-255.8. Conneally PM (1984). Huntington’s disease: genetics and epidemiology. Am J HumGenet 36: 506-526.9. Wexier NS, Rose EA, Housman DE (1991). Molecular approaches to hereditarydiseases of the nervous system: Huntington disease as a paradigm. Ann Rev Neurosci14:503-529.10. Wexier NS, Young AB, Tanzi R, Travers H, Starosta-Rubenstein S, Penney JB,Snodgrass SR, Shoulson I, Gomez F, Arrayo MAR, Penchaszadeh GK, Moreno H,Gibbons K, Faryniarz A, Hobbs W, Anderson MA, Bonilla E, Conneally PM, Gusella iF(1987). Homozygotes for Huntington’s disease. Nature 326:194- 197.11. Myers R, Leavitt J, Farrer LA, Jagadeesh J, McFarlane H, Mastromauro CA, MarkRJ, Gusella IF (1989). Homozygotes for Huntington disease. Am 3 Hum Genet 45:615-618.12. Howeler CJ, Busch HFM, Geraedts 3PM, Niermeijer MF, Staal A (1989).Anticipation in myotonic dystrophy: fact or fiction? Brain 112:10-16.13. Harper PS, Harley HG, Reardon W, Shaw DJ (1992). Anticipation in myotonicdystrophy: new light on an old problem. Am 3 Hum Genet 51:10-16.14. Sherman SL, Jacobs PA, Morton NE, Froster-Iskenius U, Howard-Peebles PN,Nielsen KB, Partington MW (1985). Further segregation analysis of the fragile Xsyndrome with special reference to transmitting males. Hum Genet 69:289-299.15. Mclnnis MG, McMahon FJ, Chase GA, Simpson SO, Ross CA, DePaulo JR (1993).Anticipation in bipolar affective disorder. Am J Hum Genet 53:385-390.1916. Penrose LS (1948). The problem of anticipation in pedigrees of dysirophia myotonica.Ann Eugenics 14:125-132.17. Brook ii), McCurrach ME, Harley HG, Buckler AJ, Church D, Aburatani H, HunterK, Stanton VP, Thirion JP, Hudson T, Sohn R, Zemelman B, Snell RG, Rundle SA,Crow S, Davies 3, Shelbourne P, Buxton J, Jones C, Junoven V, Johnson K, Harper PS,Shaw DJ, Housman DE (1992). Molecular basis of myotonic dystrophy: Expansion of atrinucleotide (CTG) repeat at the 3’ end of a transcript encoding a protein kinase familymember. Cell 68:799-808.18. Fu Y-H, Pizzuti A, Fenwick RG Jr. King J, Rajnarayan S. Dunne PW, Dubel J,Nasser GA, Ashizawa T, de Jong P, Wiereinga B, Korneluk R, Perryman MB, EpsteinHF, Caskey CT (1992). An unstable triplet repeat in a gene related to myotonic musculardystrophy. Science 255:1256-1258.19. Mahadevan M, Tsilfidis C, Sabourin L, Shutler G, Amemiya C, Jansen G, Neville C,Narang M, Barcelo J, O’Hoy K, Leblond S, Earle-MacDonald J, de Jong PJ, Wieringa B,Korneluk B (1992). Myotonic dystrophy mutation: an unstable CTG repeat in the 3’untranslated region of a candidate gene. Science 255:1253-1255.20. Oberle I, Rousseau F, Heitz D, Kretz C, Devys D, Hanauer A, Boue J, Bertheas MF,Mandel JL (1991). Instability of a 550-base pair DNA segment and abnormal methylationin fragile X syndrome. Science 262:1097-1102.21. Yu S, Pritchard M, Kremer E, Lynch M, Nancarrow J, Baker E, Hohnan KMulley JC,Warren ST, Schiessinger D, Sutherland GR, Richards RI (1991). Fragile X characterizedby an unstable region of DNA. Science 252:1179-1181.22. Fu Y-H, Kuhi DPA, Pizzuti A, Pieretti M, Sutcliffe JS, Richards S. Verkerk AJMH,Holden JJA, Fenwick RG Jr. Warren ST, Oostra BA, Nelson DL, Caskey CT (1991).Variation of the CGG repeat at the fragile X site results in genetic instability: Resolution ofthe Sherman paradox. Cell 67:1-20.23. Telenius H, Kremer HPH, Theilma.nn J, Andrew SE, Almqvist E, Anvret M,Greenberg C, Greenberg J, Lucotte G, Squitieri F, Starr E, Goldberg YP, Hayden MR(1993). Molecular analysis of juvenile Huntington disease: the major influence on CAGrepeat length is the sex of the affected parent. Hum Mol Genet 2:1535-1540.24. Ridley RM, Frith CD, Crow TJ, Conneally PM (1988). Anticipation in Huntington’sdisease is inherited through the male line but may originate in the female. J Med Genet25:589-595.25. Farrer LA, Cupples LA, Kiely DK, Conneally PM and Myers RH (1992). Inverserelationship between age of onset disease and paternal age suggests involvement of geneticimprinting. Am 3 Hum Genet 50: 528-535.26. Stevens D and Parsonage MJ (1969). Mutation in Huntington’s chorea. 3 NeurolNeurosurg Psychiatry 32:140-143.27. Denny-Brown 3(1962). The Basal Ganglia: Their Relation to Disorders ofMovemenLOxford University Press, London.2028. Bird TI) (1978). The brain in Huntington’s chorea. Psycho! Med 8:357-360.29. Vonsattel JP, Myers RH, Stevens TJ, Ferrante RJ, Bird ED (1985).Neuropathological classification of Huntington’s disease. J Neurol Exp Neuropathol44:549-557.30. Graybiel AM and Ragsdale CW (1978). Histochemically distinct compartments instriatum of human, monkey and cat demonstrated by acetyithiocholinesterase staining.Proc Nail Acad Sci USA 75:5723-26.31. Reiner A, Albin RL, Anderson KD, D’amato CJ, Penney JB, Young AB (1988).Differential loss of striatal projection neurons in Huntington’s disease. Proc Nat! Acad SciUSA 85:5733-5737.32. Tobin AJ (1989). Huntington’s disease. In Neurobiology of disease, RC Coffins, ALPearlinan, eds. Oxford University Press, New York.33. Coffins FS (1992). Positional cloning: let’s not call it reverse anymore. Nature Genet1:3-6.34. Royer-Pokora B, Kunkel LM, Monaco AP, Goff SC, Newburger PE, Baehner RI,Cole FS, Curnutte JT, Orkin SH (1986). Cloning the gene for an inherited humandisorder-chronic granulomatous disease - on the basis of its chromosomal location. Nature322:32-38.35. Monaco AP, Neve RL, Colletti-Feener C, Bertelson CJ, Kurnit DM, Kunkel LM(1986). Isolation of candidate cDNAs for portions of the Duchenne muscular dystrophygene. Nature 323:646-650.36. Friend SH, Bernards R, Rogeiji S, Weinberg RA, Rapaport JM, Albert DM, Drya TP(1986). A human DNA segment with properties of the gene that predisposes toretinoblastoma and osteosarcoma. Nature 323:643-646.37. Rommens JM, lannuzzi MC, Kerem BS, Drumm ML, Melmer 0, Dean M, RozmahelR (1989). Identification of the cystic fibrosis gene: Chromosome walking and jumping.Science 245:1059-1065.38. Morton NE (1955). Sequential tests for the detection of linkage. Am 3 Hum Genet7:277-318.39. Conneally PM and Rivas M (1980). Linkage analysis in man. In Advances in HumanGenetics Vol 10, H Harris and H. Hirschhorn, eds. Plenum Press, New York, pp 209-266.40. Lathrop GM, Lalouel M, Julier C, Ott 3. (1985). Multilocus linkage analysis inhumans: detection of linkage and estimation of recombination. Am 3 Hum Genet 37:482-498.41. Tsui L-C and Estivil X (1991). Identification of disease genes on the basis ofchromosomal localization. In Genome analysis vol 3:Genes and Phenotypes. Cold SpringHarbor Laboratory Press, Cold Spring Harbour, pp 1-36.2142. Kerem B-S, Rommens JM, Buchanan JA, Markiewicz D, Cox TK, Chakravarti A,Buchwald M, Tsui L-C (1989). Identification of the cystic fibrosis gene: Genetic analysis.Science 245:1073-1080.43. Burke DT, Cane GF, Olson MV (1987). Cloning of large segments of exogenousDNA in yeast by means of artificial chromosome vectors. Science 236:806-811.44. Stemberg N (1990). Bacteriophage P1 cloning system for isolation, amplification andrecovery of DNA fragments as large as 100 kilobase pairs. Proc Nail Acad Sci USA87:103-107.45. Collins 3 and Hohn B (1978). Cosmids: a type of plasmid gene-cloning vector that ispackageable in vitro. Proc Nail Acad Sci USA 75:4242-4250.46. Parrish JE and Nelson DL (1993). Methods for fmding genes: A major rate-limitingstep in positional cloning. GATA 10:29-41.47. Lovett M, Kere 3, Hinton LM (1991). Direct selection: A method for the isolation ofcDNAs encoded by large genomic regions. Proc Nail Acad Sci USA 88:9628-9632.48. Rommens J, Lin B, Hutchinson GB, Andrew S, Goldberg YP, Glaves ML, GrahamR, Lai V, McArthur 3, Nasir 3, Theilmann 3, McDonald H, Kalchman M, Clarke LA,Shappert K, Hayden MR (1993). A transcription map of the region containing theHuntington disease gene. Hum Mol Gen 2:901-907.49. Parimoo S, Patanjali SR, Shulda H, Chaplin DD, Weissman SM (1991). cDNAselection: Efficient PCR approach for the selection of cDNAs encoded in largechromosomal DNA fragments. Proc Nail Acad Sci USA 88:9623-9627.50. Duyk GM, Kim S, Myers RM and Cox DR (1990). Exon trapping: A genetic screen toidentify candidate transcribed sequences in cloned mammalian genomic DNA. Proc NailAcad Sci USA 87:8995-8999.51. Buckler AJ, Chang DD, Graw SL, Brook JD, Haber DA, Sharp PA, Housman DE(1991). Exon amplification: A strategy to isolate mammalian genes based on RNAsplicing. Proc Nail Acad Sci USA 88:4005-4009.52. Uberbacher EC, Mural RJ (1992). Locating protein-coding regions in human DNAsequences by multiple sensor-neural network approach. Proc Nail Acad Sci USA88:11261-11265.53. Hutchinson G and Hayden MR (1992). The prediction of exons through an analysis ofspliceable open reading frames. Nuci Acids Res 20:3453-3462.54. Myers RM, Lumelsky N, Lerman LS, Maniatis T (1985). Detection of single basesubstitutions in total genomic DNA. Nature 313:495-498.55. Myers RM, Maniatis T, Lerman LS (1987). Detection and localization of single basechanges by denaturing gel electrophoresis. Methods Enzymol 155:501-527.56. Saleeba JA, Ramus SJ, Cotton RGH (1992). Complete mutation detection usingunlabelled chemical cleavage. Human Mutation 1:63-69.2257. Orita M, Suzuki Y, Seikiya T, Hayashi K (1989). Rapid and sensitive detection ofpoint mutations and DNA polymorphisms using the polymerase chain reaction. Genomics5:874-879.58. Wong C, Dowling CE, Saiki RK, Higuchi RG, Erlich HA, Kazazian Jr HH (1988).Characterization of beta-thalassemia mutation using direct genomic sequencing of amplifiedsingle copy DNA. Nature 330:384-387.59. Kunkel LM, Monaco AP, Middlesworth W, Ochs HI), Latt SA (1985). Specificcloning of DNA fragments absent from the DNA of a male patient with an X chromosomedeletion. Proc Nail Acad Sci USA 82:4778-47 82.60. Botstien D, White RL, Skolnick M, Davis RW (1980). Construction of a geneticlinkage map in man using restriction length polymorphisms. Am 3 Hum Genet 32:3 14-331.61. Gusella iF, Wexier NS, Conneally PM, Naylor S, Anderson RE, Tanzi RE, WatkinsK, Ottina M, WallaceA, Sakaguchi A, Young I, Shoulson E, Bonilla E, Martin JB (1983).A polymorphic marker genetically linked to Huntington’s disease. Nature 306:234-238.62. Sturtevant AH (1913). The linear arrangement of six sex-linked factors in Drosophila,as shown by their mode of association. J Exp Zool 14:43-49.63. Landegent SE, Jansen in de Wal N, Fisser-Groen YM, Baker E, van der Ploeg M,Pearson PL (1986). Fine mapping of the Huntington disease linked D4S 10 locus by non-radioactive in situ hybridization. Hum Genet 73:354-357.64. Wang HS, Greenberg CR, Hewitt 3, Kalousek D, Hayden MR (1986). Subregionalassignment of the linked marker G8 (D4S 10) for Huntington disease to chromosome4pl6.3. Am 3 Hum Genet 39:392-396.65. MacDonald ME, Anderson MA, Gilliam TC, Tranebjaerg L, Carpenter NJ, Magenis E,Hayden MR, Healey ST, Bonner TI, Gusella SF (1987). A somatic cell hybrid panel forlocalizing DNA segments near the Huntington disease gene. Genomics 1:29-34.66. Gilliam TC, Tanzi RE, Haines IL, Bonner TI, Faiyniarz AG, Hobbs WJ, MacDonaldME, Cheng SV, Folstein SE, Conneally PM, Wexier NS, Gusella IF (1987). Localizationof the Huntington’s disease gene to a small segment of chromosome 4 flanked by D4S1Oand the telomere. Cell 50:565-57 1.67. Conneally PM, Haines IL, Tanzi RE, Wexier NS, Penchaszadeh GK, Harper PS,Foistein SE, Cassiman JJ, Myers RH, Young AB, Hayden MR, Falek A, Tolosa ES,Crespi S, Di Maio L, Holmgren, Anvret M, Kanazawa I, Gusella IF (1989). Huntington’sdisease: No evidence for locus heterogeneity. Genomics 5:304-308.68. Youngman 5, Sarfarazi M, Quarrell OWJ, Conneally PM, Gibbons K, Harper PS,Shaw DJ, Tanzi RE, Wallace MR, Gusella IF (1986). Studies of a DNA marker (G8)genetically linked to Huntington disease in British families. Hum Genet 73:333-339.2369. Greenberg U, Martell RW, Theilmann 3, Hayden MR, Joubert 3 (1991). Geneticlinkage between Huntington disease and the D4S 10 locus in South African families: furtherevidence against non-allelic heterogeneity. Hum Genet 87:701-708.70. Frontali M, Malaspina P. Rossi C, Jacopini AG, Vivona G, Pergola MS, Palena A,Novelletto A (1990). Epidemiological and linkage studies on Huntington’s disease in Italy.Hum Genet 85:165- 170.71. Kanazawa I, Kondo I, Ikeda JE, Ikeda T, Shizu Y, Yoshida M, Narabayashi H,Kuroda S, Tsunoda H, Mizuta E, Okuno Y, Sugawara K, Murata M, Takahashi M,Gusella iF (1990). Studies on DNA markers (D4S1O and D4S431S 127) genetically linkedto Huntington’s disease in Japanese families. Hum Genet 85:257-260.72. Ajmar F, Mandich P. Bellone E, Abbruzzese G (1991). Genetic analysis ofHuntington disease in Italy. Am 3 Med Gen 39:2 1 1-2 14.73. Wasmuth 33, Hewitt 3, Smith B, Allard D, Haines IL, Skarecky D, Partlow, HaydenMR (1988). A highly polymorphic locus very tightly linked to the Huntington diseasegene. Nature:332:734-736.74. Pritchard CA, Cox DR, Myers RM (1991). Invited Editorial: The end in sight forHuntington disease. Am 3 Hum Genet 49:1-6.75. Weber B, Riess 0, Wolff G, Andrew S, Collins C, Graham R, Theilmann 3, HaydenMR (1992). Delineation of a 50kb DNA segment containing the recombination site in asporadic case of Huntington’s disease. Nature Genet 2: 216-222..76. Skraastad MI, Bakker E, de Lange LF, Vegter van der Vlis M, Klein-Breteler EG, vanOmmen GJB, Pearson PL (1989). Mapping of recombinants near the Huntington diseaselocus by using G8 (D4S1O) and newly isolated markers in the D4S1O region. Am 3 HumGenet 44: 560-566.77. Allitto BA, MacDonald ME, Bucan M, Richards J, Romano D, Whaley WL, FalconeB, lanazzi I, Wexier NS, Wasmuth 33, Collins FS, Lehrach H, Haines IL, Gusella IF(1991). Increased recombination adjacent to the Huntington disease-linked D4S1O marker.Genomics 9:104-112.78. Richards JE, Giffiam TC, Cole IL, Drumm ML, Wasmuth 33, Gusella IF, Collins FS(1988). Chromosome jumping from D4S 10 (G8) toward the Huntington disease gene.Proc Natl Acad Sci USA 85:6437-6441.79. Bates GP, MacDonald ME, Baxendale S. Youngman 5, Lin C, Whaley L, Wasmuth33, Gusella IF, Lebrach H (1991). Defined physical limits of the Huntington disease genecandidate region. Am J Hum Genet 49:7- 16.80. Whaley WL, Bates GP, Novelletto A, Sedlacek Z, Cheng S, Romano D, OrmondroydE, Allitto B, Lin C, Youngman S, Baxendale S. Bucan M, Altherr M, Wasmuth 3, WexlerNS, Frontali M, Frischauf A-M, Lehrach H, MacDonald ME, Gusella IF (1991).Mapping of cosmid clones in Huntington’s disease region of chromosome 4. Som Celland Mol Genet 17:83-9 1.2481. Snell RG, Thompson LM, Tagle DA, Holloway TL, Barnes G, Harley HG, SandkuijlLA, MacDonald ME, Collins FS, Gusella iF, Harper PS, Shaw DJ (1992). Arecombination event that redefmes the Huntington disease region. Am J Hum Genet51:357-362.82. Barron L, Curtis A, Shrimpton AE, Holloway S, May H, Snell RG, Brock DJH(1991). Linkage disequilibrium and recombination make a telomeric site for theHuntington disease gene unlikely. 3 Med Genet 28:520-522.83. MacDonald ME, Haines IL, Zimmer M, Cheng SV, Youngman S, Whaley WL,Wexier N, Bucan M, Auto BA, Smith B, Leavitt 3, Poustka A, Harper P, Lehrach H,Wasmuth JJ, Frischauf AM, Gusella JF (1989). Recombination events suggest potenialsites for the Huntington disease gene. Neuron 3:183-190.84. Robbins C, Theilmann 3, Youngman 5, Haines J, Altherr MJ, Harper PS, Payne C,Junker A, Wasmuth 3, Hayden MR (1989). Evidence from family studies that the genecausing Huntington disease is telomeric to D4S95 and D4S90. Am 3 Hum Genet 44:422-425.85. Bates GP, MacDonald ME, Baxendale S, Sedlacek Z, Youngman S. Romano D,Whaley WL, Allitto BA, Poustka A, Gusella IF, Lehrach H (1990). A yeast artificialchromosome telomere clone spanning a possible location of the Huntington disease-linkedD4S 10 marker. Genomics 9:104-112.86. Pritchard C, Zhu N, Zuo J, Bull L, Pericak-Vance MA, Vance 3M, Roses A,Milatovich A, Francke U, Cox DR, Myers RM (1992). Recombination of 4pl6 DNAmarkers in an unusual family with Huntington disease. Am 3 Hum Genet 50:1218- 1230.87. Pritchard C, Casher D, Bull L, Cox Dr. Myers RM (1990). A cloned DNA segmentfrom the telomeric region of human chromosome 4p is not detectably rearranged inHuntington disease patients. Proc Natl Acad Sci USA 87:7309-7313.88. Youngman S, Bates GP, Williams 5, McClatchey Al, Bexendale 5, Zdenek Sedlacek,Altherr M, Wasmuth 33, MacDonald ME, Gusella IF, Sheer D, Lehrach H (1992). Thetelomeric 60 kb of chromosome arm 4p is homologous to telomeric regions on l3p, l5p,2lp, and 22p. Genomics 14:350-356.89. Wolff G, Deuschl G, Wienker TF, Hummel K, Bender K, Lucking C, Schumacher M,Hammer 3, Oepen G (1989). New mutation to Huntington’s disease. 3 Med Genet 26:18-27.90. Buetow KH, Shiang R, Yang P, Nakamura Y, Lathrop GM, White R, Wasmuth 33,Wood 5, Berdahi LD, Leysens NJ, Ritty TM, Wise ME, Murray JC (1991). A detailedmultipoint map of human chromosome 4 provides evidence for linkage heterogeneity andposition-specific recombination rates. Am 3 Hum Genet 48:911-925.91. Zuo 3, Robbins C, Baharloo S. Cox DR. Myers R (1993). Construction of cosmidcontigs and high-resolution restriction mapping of the Huntington disease region of humanchromosome 4. Hum Mol Genet 2:889-899.92. Bucan M, Zimmer M, Whaley WL, Poustka A, Youngman S, Allitto BA, OrmondroydE, Smith B, Pohl TM, MacDonald ME, Bates GP, Richards 3, Volinia S, Gilliam TC,25Sedlacek Z, Collins FS, Wasmuth 33, Shaw D3, Gusella SF, Frischauf AM, Lebrach H(1990). Physical maps of ‘lpl6.3, the area expected to contain the Huntington diseasemutation. Genomics 6:1-15.93. McCombie WR, Martin-Gallardo A, Gocayne ii), Fitzgerald M Dubnick M, KelleyJM, Castilla L, Liu LI, Wallace S, Trapp S. Tagle D, Whaley WL, Cheng S, Gusella 3,Frischauf A-M, Poustka A, Lehrach H, Collins FS, Kerlavage AR, Fields C, Venter JC(1992). Expressed genes, Alu repeats and polymorphisms in cosmids sequenced fromchromosome 4p16.3 Nature Genet 1:348:353.94. Weber, B, Riess 0, Hutchinson G, Collins C, Lin B, Kowbel D, Andrew S.Schappert K, Hayden (1992). Genomic organization and complete sequence of the humangene encoding the n-subunit of the cGMP phosphodiesterase and its localisation to 4p16.3.Nucl Acids Res 19:6263-6268.95. Collins C, Schappert K, Hayden MR (1992). The genomic organization of a novelregulatory myosin light chain (MYL5) that maps to chromosome 4pl6.3 and showsdifferent patterns of expression between primates. Hum Mol Gen 1:727-733.96. Riess 0, Noerremoelle A, Collins C, Mah D, Weber B, Hayden MR (1992).Exclusion of DNA changes in the f3-subunit of the cGMP phosphodiesterase gene as thecause for Huntington disease. Nature Genet 1:104-108.97. Rommens 3, Lin B, Hutchinson GB, Andrew SE, Goldberg YP, Glaves ML, GrahamR, Vai L, McArthur 3, Theilmann 3, McDonald H, Kalchman M, Clarke LA, Schappert K,Hayden MR (1993). A transcription map of the region containing the Huntington diseasegene. Hum Mol Genet 2:901-907.98. Huntington Disease Collaborative Research Group (1993). A novel gene containing atrinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes.Cell 72:971-983.99. Andrew SE, Goldberg YP, Kremer B, Telenius H, Theilmann 3, Adam S, Starr E,Squitieri F, Lin B, Kalchman MA, Graham RK, Hayden MR (1993). The relationshipbetween trinucleotide (CAG) repeat length and clinical features of Huntington’s disease.Nature Genet 4:398-403.100. Lin B, Rommens JM, Graham RK, Kalchman M, McDonald H, Nasir 3, Delaney A,Goldberg YP, Hayden MR (1994). Differential 3’ polyadenylation of the Huntingtondisease gene results in two mRNA species with variable tissue expression. Hum MolGenet 2:1541-1545.101. Yu 5, Pritchard M, Kremer E, Lynch M, Nancarrow 3, Baker E, Holman K, MulleyJC, Warren ST, Schiessinger D, Sutherland GR, Richards RI (1991). Fragile X genotypecharacterized by an unstable region of DNA. Science 252:1179-1181.102. Kremer EJ, Pritchard M, Lynch M, Yu S, Holman K, Baker E, Warren ST,Schiessinger D, Sutherland GR, Richards RI (1991). Mapping of DNA instability at thefragile X to a trinucleotide repeat sequence p(CCG)n. Science 252:1711-1714.103. Knight SJL, Flannery AV, Hirst MC, Campbell L, Christodoulou Z, Phelps SR,Pointon 3, Middleton-Price HR, Barnicoat A, Pembrey ME, Holland J, Oostra BA,26Bobrow M, Davies KE (1993). Trinucleotide repeat amplification and hypermethylation ofa CpG island in FRAXE mental retardation. Cell 74:127-134.104. Fu Y-H, Pizzuti A, Fenwick RG Jr, King 3, Rajnarayan S. Dunne PW, Dubel 3,Nasses GA, Ashizawa T, de Jong P. Wieringa B, Lorneluk R, Perryman MB, Epstein HF,Caskey CT (1992). An unstable triplet repeat in a gene related to myotonic musculardystrophy. Science 255:1256-1258.105 Mahadevan M, Tsilfidis C, Sabourin L, Shutler G, Amemiya C, Jansen G, Neville C,Narang M, Barcelo 3, O’Hoy K, Leblond S, Earle-MacDonald 3, de Jong P3, Wieringa B,Korneluk RG (1992). Myotonic dystrophy mutation an unstable CTG repeat in the 3’untranslated region of the gene. Science 255:1253-1255.106. Brook JD, McCurrach ME, Harley HG, Buckler AJ, Church D, Aburatani H, HunterK, Stanton VP, Thirion JP, Hudson T, Sohn R, Zemelman B, Snell RG, Rundle SA,Crow S, Davies 3, Shelbourne P, Buxton 3, Jones C, Juvonen V, Johnson K, Harper PS,Shaw DJ, Housman DE (1992). Molecular basis of myotonic dystrophy: expansion of atrinucleotide (CTG) repeat at the 3’ end of a transcript encoding a protein kinase familymember. Cell 68:799-808.107. La Spada AR, Wilson EM, Lubahn DB, Harding AE, Fischbeck KH (1991).Androgen receptor gene mutations in X-linked spinal and bulbar muscular atrophy. Nature352:77-79.108. Orr, HT, Chung M, Banfi S, Kwiatowski TJ Jr, Servadio A, Beadet AL, McCall AE,Duvick LA, Ranum LPW, Zoghbi HY (1993). Expansion of an unstable trinucleotideCAG repeat in spinocerebellar ataxia type 1. Nature Genet 4:221-226.109. Nagafuchi S. Yanagisawa H, Sato K, Shirayama T, Ohsaki E, Bundo M, Takeda T,Tadokoro K, Kondo I, Murayama N, Tanaka Y, Kikushima H, Umino K, Kurosawa H,Furukawa T, Nihei K, Inoue T, Sano A, Komure 0, Takahashi M, Yoshizawa T,Kanazawa I, Yamada M (1994). Expansion of an unstable CAG trinucleotide onchromosome 12p in dentatorubral pallidoluysian atrophy. Nature Genet 6:14-18.110. Koide R, Ikeuchi T, Onodera 0, Tanaka H, Igarashi S, Endo K, Takahashi H,Kondo R, Ishikawa A, Hayashi T, Saito M, Tomoda A, Miike T, Naito H, Ikuta F, TsujiS (1994). Unstable expansion of CAG repeat in hereditary dentatorubral-pallidoluysianstrophy (DRPLA). Nature Genet 6:9-13.27CHAPTER 2MATERIALS AND METHODS282.1 GENETIC ANALYSISAll family DNA used was part of the Canadian Predictive Testing program for ND or wascontributed separately by HD families for research purposes. The clinical diagnosis of HDwas made by a neurologist or geneticist. Clinical details of patients were obtained fromextensive records, documented neurological examination, and special investigations such ascomputerized positron emission tomography (PET) and autopsy records.Approximately 80% of the DNA samples banked in Vancouver are of UK or WesternEuropean descent. For family studies, alleles in unrelated family members and thecanonical non-Huntington allele of affected persons were used as control chromosomes.For linkage disequilibrium studies it was assumed that an unaffected spouse was of similarancestry to that of the affected patient, and that allele frequencies determined from controlchromosomes were an accurate estimate of the allele frequencies in the population that theND patients are drawn from. However, the measure of disequilibrium is highly dependenton allele frequencies and the possibility of inaccurate matching of controls to ND patients isa limitation of this type of analysis. To overcome this, homogeneous populations withconfirnied ancestral similarity between controls and affecteds were analyzed in Chapter 4.For allelic association studies, one affected chromosome was counted per pedigree and asmany unique control chromosomes as could be determined. The number of affectedindividuals and controls used for analysis with each probe varied depending on theavailability of DNA and informativeness with each probe. For haplotype analysis, affectedand control haplotypes were constructed based on family data where phase wasunequivocally determined.292.2 DNA ISOLATION AND SOUTHERN BLOT ANALYSISDNA was extracted from leukocytes by standard extraction procedures1.For Southern blotanalysis, 5.i,g of genomic DNA was digested to completion with the appropriate restrictionenzyme (BRL). DNA was fractionated on an agarose gel (7 to 1.2%) by electrophoresis,then transferred to Hybond N membranes (Amersham)2.Southern blots were prehybridized for 1 hour and hybridized overnight in 0.5M sodiumphosphate buffer, pH 7.2, 7% SDS and 1mM EDTA3at 65°C. Blots were washed twicefor 20 minutes each at .5X SSC, .1% SDS at 65°C, followed by a final stringent wash at.1XSSC, .1% SDS at 65°C for 5 minutes. Autoradiography was from overnight to 4 days.2.3 DNA PROBESAll probes used are listed in Table 2-1’. The order of the DNA markers has beenpreviously determined and is shown in Figure 216.117.2.4 PREPARATION OF HYBRIDIZATION PROBESCloned DNA inserts were prepared as probes for hybridization as follows. DNA insertswere isolated from vector sequence by digestion with the appropriate restriction enzyme(BRL) and purification from low melting point agarose gel (BRL) slices. All DNA probeswere labeled by oligolabeling18 and purified on a G-25 spin column19. Probes wereblocked for repetitive elements prior to hybridization by boiling for 10 minutes togetherwith 300 p.g of sonicated total human DNA, followed by pre annealing at 65°C for 1 hour.2.5 PCR PRIMERSOligonucleotides were synthesized on a PCR-Mate 391 DNA synthesizer (AppliedBiosystems) and purified by reverse phase chromatography (Sep-Pak C18, Waters)according to Atkinson and Smith20.30Table 2-1. List of previously published probes used in this thesis.PROBE LOCUS REFERENCENUMBERP8 D4S62 4G8 D4S1O 5pBS674E-D (674) D4S95 6127CA D4S 127 12S7 D4S43 7pBS731B-C (731) D4S98 8pBS678 (678) D4S96 8p157-9 (157) D4S111 9Rs3 D4S227 10Ac2 D4S227 10BS1 D4S133 11cl6Dp/E2Rep (E2) D4S228 10cl6DpfM4.2 (4.2) D4S228 102R3 D4S141 13Figure2-1.Aschematicmapof4pl6.3showingapproximatedistancesbetweenmarkers(compiledfromreferences4to13).(P8)(GO)D4S6204S10D4S180(674)D4S95(678)(157)(E2&4.2)D4S96D4SI1ID4S227D4S228(S7)D4S43I.0D4S168÷III“IIIIII*III4cD4S125(BJ56)D45182(731)(BS1)(2R3)4pterD4S127D4S98D45133D451412tD5D4S903III//I45-I6Mb322.6 DNA SEQUENCING2.6.1 DOUBLE STRAND SEQUENCING OF CLONED PRODUCTSClones were sequenced manually according to the dideoxysequencing method (SequenaseKit USB).2.6.2 SEQUENCING OF SINGLE STRAND PCR PRODUCTSAsymmetric PCR was used to generate single-strand DNA templates from PCR productsaccording to Sambrook10. Double strand PCR product was obtained after initialamplification of first strand eDNA. The PCR fragments were then purified using GeneClean (Bio Can 101 mc) and used as a template for generating single strand PCR productsusing a primer ratio of 100:1 for the two oligonucleotide primers for 45 cycles. Singlestrand asymmetric PCR products were purified by centrifugal filtration (30000 NMWLfilter, Millipore) and fmally sequenced using 1 pmole of the limiting primer by the dideoxysequencing method (Sequenase Kit USB).2.7 PREPARATION OF cDNA TEMPLATEFirst strand cDNA was prepared from previously isolated RNA from control and affectedtissue. 5 .tg of RNA was reverse transcribed according to the Superscript pre amplificationsystem (Amersham) with 0.5 pmole random hexamers and 0.5 pmole oligo(d’l), 0. 1mMdNTPs, 10mM DTT, 36 units RNasin, 200 units reverse transcriptase, and RT buffer(20mM Tris-HC1 (pH 8.3), 50 mM KC1, 2.5 mM MgC12, .1 mg/mi BSA). First strandcDNA was diluted 1:100 and used as PCR template.2.8 STATISTICAL ANALYSIS OF NONRANDOM ALLELICASSOCIATIONTo determine the extent of nonrandom allelic association or linkage disequilibrium, acorrelation coefficient, r, was estimated as defined previously21’2 in which r =33D/(pjpqq)l/2where p1 and P2 axe the frequencies of the alleles of locus A, and qj andq2 are the frequencies of the alleles of locus B. D is the linkage disequilibrium parameterdefined as: D = P11- piqi, where Pu is the frequency of the A1B] haplotype. A positivevalue of D indicates the two most common alleles at each locus are in association, whereasa negative value indicates a common allele at one locus and a rare allele at the other. Oneconstraint of D as a measure of allelic association is its dependence on allele frequencies.The r value varies between +1.0 and -1.0. A chi-square test of the null hypothesis wasgiven by x2 = Nr2 where N is the total number of gametes in the sample. The chi-squarestatistic has (rn-i) x (n-i) degrees of freedom, where rn and n are the number of alleles.For comparisons involving loci with more than two alleles, the most common allele at thislocus was defmed as one allele and the remaining alleles were pooled to form a second.Even though this approach has previously been taken in prior studies of linkagedisequilibrium20’this can alter the statistical power of the disequilibrium test. Furtheranalysis was therefore undertaken, using a chi-square test for linkage disequilibriumbetween multiallelic loci which had previously been defined23’tas:X2=N jJIn this summation approach, locus A has rn alleles A, i=1,...,rn, locus B has n alleles B,j=i,...,n, population frequencies of alleles A and B are written as pf and qj and N isdefined as the total number of gametes in the sample23’4.D is given by; Djj = p(j -Pjqj.The cM-square statistic has (rn-i) x (n-i) degrees of freedom.The Yule’s association coefficient (IQI) is another measure of the degree of allelicassociation that is less dependent than the D statistic on allele frequencies. This wascalculated by IQI=I(ad-bc)/(ad-t-bc)I, where a = the number of control chromosomes with34allele A, b = the number of HI) chromosomes with allele A, c = the number of controlchromosomes with allele B and d = the number of HI) chromosomes with B. For multiallelic RFLPs the Yule’s coefficient was calculated by using the most common allele versuspooled remaining alleles. IQI ranges from 0 to 1 with a IQI of 1 representing maximumallelic association. Although this coefficient dates to 1968, its use in successfully definingthe minimal candidate region for CF and pinpointing the region for the CF mutationconfirmed its appropriateness and reinstituted its use in allelic association studies37.Pairwise haplotype analysis for control chromosomes was performed by comparing thenumber of observed haplotypes to the number expected based on allele frequenciesdetermined from controls. Pairwise haplotype analysis for H]) chromosomes wasperformed comparing the number of observed haplotypes on I{D chromosomes to thoseseen in controls.2.9 STATISTICAL ANALYSIS OF CAG ANALYSISTo examine the relationship between age of onset and CAG length, linear regression wasused, with logarithmic transformation of the age of onset, allowing the treatment of anexponential function as an intrinsically linear model. The use of untransformed valuesyielded a regression line that crossed the x-axis at a CAG length of about 85, implying thatlengths beyond that value are associated with negative age of onset, clearly supporting theuse of transformed values in this analysis. Furthermore, visual inspection of the plots andexamination of the residuals all pointed to the exponential model being superior to the linearmodel. Data on all other clinical features such as age of onset of chorea, or age of death,were similarly log-transformed. Confidence limits for prediction were calculated from thelog transformed data (using standard formulas26)and then back-transformed to absoluteages of onset All statistical analysis was done using Systat.352.10 REFERENCES1. Kunkel LM, Smith KD, Boyer SH, Borgaonkar DS, Wachtel SS, Miller 03, Breg WR,Jones 11W Jr, Rary JM (1977). Analysis of human Y chromosome specific reiterated DNAin chromosome variants. Proc Nati Acad Sci USA 74:1245-1249.2. Southern EM (1975). Detection of specific sequences among DNA fragments separatedby gel elecirophoresis. 3 Mol Biol 98: 503-517.3. Church GM and Gilbert W (1984). Genomic sequencing. Proc Natl Acad Sci USA81:1991-1995.4. Hayden MR. Hewitt 3, Wasmuth 33, Kastelein 33, Langlois S, Comically M, Haines J,Smith B, Hubert C, Allard D (1988). A polymorphic DNA marker that represents aconserved expressed sequence in the region of the Huntington disease gene. Am 3 HumGenet 42:125-131.5. Gusella iF, Wexler NS, Conneally PM, Naylor S, Anderson RE, Tanzi RE, WatkinsK, Omna M, WallaceA, Sakaguchi A, Young I, Shoulson E, Bonilla E, Martin JB (1983).A polymorphic marker genetically linked to Huntington s disease. Nature 306:234-238.6. Wasmuth 33, Hewitt 3, Smith B, Allard D, Haines JL, Skarecky D, Parlow E, HaydenMR (1988). A highly polymorphic locus very tightly linked to the Huntington diseasegene. Nature 322:734-736.7. Giffiam TC, Bucan M, MacDonald ME, Zimmer M, Haines JL, Cheng SV, Pohi TM(1987). A DNA segment encoding two genes very tightly linked to Huntington’s disease.Science 238:950-952.8. Smith B, Skarecky D, Bengtsson U, Magenis RE, Carpenter N, Wasmuth 33 (1988).Isolation of DNA markers in the direction of the Huntington disease gene from the G8locus. Am 3 Hum Genet 42:335-344.9. Pohi M, Zimmer M, MacDonald ME, Smith B, Bucan M, Poustka A, Volinia S (1988).Construction of a NotI library and isolation of new markers close to the Huntington diseasegene. Nuci Acids Res 16:9165-9198.10. Weber B, Hedrick A, Andrew S, Riess 0, Collins C, Kowbel D, Hayden MR (1992).Isolation and characterization of new highly polymorphic DNA markers from a candidateregion for the Huntington disease gene. Am 3 Hum Genet 50:382-393.11. Pritchard CA, Casher D, Uglum E, Cox DR, Myers RM (1989). Isolation and fieldinversion gel electrophoresis analysis of DNA markers located close to the Huntingtondisease gene. Genomics 4:408-418.12. Taylor SAM, Barnes GT, MacDonald ME, Gusella iF (1992). A dinucleotide repeatpolymorphism at the D4S 127 locus. Nucl Acids Res 1:142.3613. Snell RG, Youngman S. Lehrach H, Sarafarazi M, Harper PS, shaw DJ (1989). Anew probe (2R3) in the region of Huntington s disease. Cytogenet Cell Genet 51:1083.14. Whaley WL, Michiels F, MacDonald ME, Romano D, Zimmer M, Smith B, Leavitt J,Bucan M, Haines JL, Gihiam TC, Zehetner G, Smith C, Cantor CR, Frischauf AM,Wasmuth 33, Lebrach H, Gusella iF (1988). Mapping of D4S98/S 114/Si 13 confines theHuntington s defect to a reduced physical region at the telomere of chromosome 4. NuclAcids Res 16:11769-11780.15. Robbins C, Theilmann 3, Youngman 5, Haines 3, Altherr M3, Harper PS, Payne C,Junker A, Wasmuth 3, Hayden MR (1989). Evidence from family studies that the genecausing Huntington disease is telomeric to D4S95 and D4S90. Am 3 Hum Genet 44:422-425.16. MacDonald ME, Lin C, Srinidhi L, Bates G, Alterr M, Youngman S, Whaley WL,Wexler N, Bucan M, Allitto BA, Smith B, Leavitt 3, Poustka A, Harper P, Lehrach H,Wasmuth 33, Frischauf AM, Gusella iF (1989). Recombination events suggest potentialsites for the Huntington disease gene. Neuron 3:183-190.17. Weber B, Collins C, Kowbel D, mess 0, and Hayden MR (1991). Identification ofmultiple CpG-islands and associated conserved sequences in a candidate region for theHuntington disease gene. Genomics 11:1113-1124.18. Feinberg AP and Vogelstein B (1983). A technique for radiolabelling DNA restrictionendonuclease fragments to high specific activity. Anal Biochem 132:6-13.19. Sambrook 3, Fritsch EF, Maniatis T (1989). Molecular cloning. A laborc#ory manual,2nd ed. Cold Spring Harbor Press, Cold Spring Harbor, NY.20. Atkinson T and Smith M (1984). Solid phase synthesis of oligodeoxyribonucleotidesby the phosphitetriested method. In Oligonucleotide synthesis: A practical approach. GaitMJ ed. IRL Press, Oxford, pp 35-8 1.21. Litt M and Jorde LB (1986). Linkage disequilibrium between pairs of loci within ahighly polymorphic region of chromosome 2q. Am 3 Hum Genet 39:166-178.22. Hill WG and Robertson A (1986). Linkage disequilibrium in finite populations. TheorAppl Genet 38: 226-23 1.23. Hill WG (1975). Linkage disequilibrium among multiple neutral alleles provided bymutation in finite populations. Theor Popul Biol 8:117-126.24. Weir BS and Cockerham CC (1978). Testing hypothesis about linkage disequilibriumwith multiple alleles. Genetics 88: 633-642.25. Yule GU and Kendall MG (1968). An introduction to the theory ofstatistics, 14th ed.Charles Griffin, London.26. Draper N and Smith H. (1981). Applied Regression Analysis 2nd ed. Wiley, NewYork.37CHAPTER 3NONRANDOM ALLELIC ASSOCIATIONThe work presented in this chapter has contributed to two publications.Andrew SE, Theilmann J, Hedrick A, Mah D, Weber B, Hayden MR (1992).Nonrandom allele association between Huntington disease and two loci separated by about3Mb on 4p16.3. Genomics 13:301-311.Weber B, Hedrick A, Andrew SE, Riess 0, Collins C, Kowbel D, Hayden MR (1991).Isolation and characterization of new highly polymorphic DNA markers from a candidateregion for the Huntington disease gene. Am 3 Hum Genet 50:382-393.383.1 INTRODUCTION3.1.1 ALLELIC ASSOCIATIONNonrandom allelic association or linkage disequilibrium is the identification of nonrandomassociation of an allele at one locus with an allele at another locus’. As a measure of thedeviation of allele frequencies of pairs of markers at two loci from that of expected, linkagedisequilibrium measures the degree of departure from equilibrium’ and reflects disturbingforces including selection, migration, or mutation2.This principle has been adapted from population biology and applied to the search fordisease genes, by using the disease locus as one locus and linked markers as the secondlocus. In these studies, nonrandom allelic association between a marker and a diseaselocus has been assessed by comparing allele frequencies in a group of affected patients tothose of a normal control population. Since loci located closest to the site of mutationundergo recombination less frequently than those at a greater distance, they are likely toexhibit a higher degree of nonrandom allelic association. Therefore, linkage disequilibriumreflects, and exploits, the effects of recombination over many previous generations3.Sinceassociation is evolutionarily related to recombination, and thus distance, an increase in themeasure of association as one moves along a chromosome is indicative of movementtowards the disease gene.Kan and Dozy were the first to demonstrate an RFLP close to the (3-globin gene was instrong nonrandom allelic association in American blacks with sickle cell anemia4. Morerecently, the use of linkage disequilibrium mapping has led to the refinement of thecandidate regions and subsequent cloning of the genes for Friedrich ataxia5,myotonicdystrophy6,Wilson’s disease7,cystic fibrosis (CF)8 and others. A classic example of therole of association data in localization of a disease gene was in CF, where increasingmeasures of nonrandom allelic association were used to localize the CF gene to an 800-kb39region, allowing for subsequent identification of the CFIK gene from the region of thehighest values of association8.Linkage disequilibrium was also used to reduce the likelycandidate region for the diastrophic dysplasia gene to 60 kb, from a previously determinedcandidate region of 1.6 Mb, which had been limited by a lack of informative meioses9.There are several requirements to do appropriate nonrandom allelic association analyses.Firstly, locus heterogeneity must not be present. Mutations in more than one gene wouldweaken any chance of detecting association between the disease and any one particularmarker allele. Secondly, the disease must have a low mutation rate, ideally, with themajority of affected chromosomes descended from one mutation on one ancestralhaplotype. Multiple mutations within the same gene may have occurred on distinctchromosomal backgrounds, making it difficult to detect association. Another importantconsideration is that the disease be prevalent enough to allow for collection of a sufficientnumber of affected individuals, in order to establish significant results.The genetic features of HD fit the above criteria thus allowing for the appropriate use ofassociation studies to aid in the refinement of the candidate region. Conneally et a!.showed evidence for lack of locus heterogeneity in HD10 and the mutation rate for HD wasestimated to be one of the lowest of all known human genetic diseases11. In addition, thecollection in Vancouver of over 1000 affected individuals, from more than 500 familieswith lID provided an invaluable resource for analysis.The relationship between physical distance between markers and the amount of linkagedisequilibrium is not consistent. Theoretically, disequilibrium and distance have an inverserelationship of r2 = 1/(1-I-4Nec), where r is a standard measure of disequilibrium, Ne iseffective population size and c is the recombination fraction12’3.However, there areseveral reasons why this is not always the case. This only holds if 4NCC is large, and40therefore, this may not be true in small genomic regions. In addition, the measure ofdisequilibrium is dependent on the rate of recombination which is not constant across thegenomet4,and the measures of disequilibrium reflect this unequal rate of recombinationalong a chromosome.An examination of disequilibrium between pairs of loci on chromosomes 2q demonstratedthat “although a regular relationship between disequilibrium and physical distance mayoccur in some small chromosomal regions, it cannot necessarily be expected to exist”15. Areasonably uniform relationship between physical distance and disequilibrium across smallgenomic regions has been observed in some cases’6-22.Whereas in other instances15’2334,no consistent relationship is maintained.For comparisons involving loci with more than two alleles, testing for nonrandom allelicassociation can be carried out, with more than one degree of freedom, on the complete setof data. One disadvantage of multi-allelic markers is the decreased number of individualsin each categoiy for analysis, therefore pooling alleles is one method of countering theproblem of small numbers. The most common allele at one locus is defined as one alleleand the remaining alleles are pooled to form a second. However, the relationship betweenpooling alleles and detection of nonrandom association is not clear and pooling can alter thestatistical power of the test, so that the chance of detection may or may not be decreased35.Another method, the summation approach, provides an additional measure ofdisequilibrium for multiallelic systems35’6.3.1.2 ALLELIC ASSOCIATION IN THE HUNTINGTON DISEASECANDIDATE REGIONThe small numbers of informative recombinant families, the discrepancies in therecombinant data, and the resulting enormity of the problem of analyzing hundreds of41candidate genes argued that an alternate approach, such as the use of nonrandom allelicassociation was necessary to refine the localization of the ND gene.Early studies demonstrate that there are conflicting and confusing deviations in allelefrequencies seen in different populations (Table 3-1) (Figure 3-1). Five previous studiesmeasuring nonrandom allelic association have supported a proximal location for the NDgene37-41. Strong linkage disequilibrium between HI) and D4S95 (AccI and MboIpolymorphisms) was observed in all populations tested. Significant association was seenwith other markers in this region, namely D4S 180, D4S 127, D4S 182 and D4S4338.Linkage disequilibrium with D4S98 (Sstl polymorphism) was only seen in somepopulations, particularly those of English and Scottish descent37’840. Interestingly,association initially seen in one population, was no longer observed when the sample sizewas increased four fold, emphasizing the importance of a large sample size39Al.Analysis in an Italian population showed association between ND and proximal markersD4S 10, D4S 127 and D4S43, although it is possible that these results are spurious due tothe small number of individuals in both cohorts42.Many other markers had been tested in the previous studies such as proximal loci D4S62,and the more telomeric loci at D4S 141 and D4S90, however no association was seen in theprimarily European data sets. Failure to detect nonrandom allelic association, however,does not always imply its absence, particularly when allele frequencies are unequal and NDsegregates with the frequent allele or when the sample size is smallThus, the patterns of linkage disequilibrium across the 4p16.3 region were complex,showing no continuous trend towards a peak measure of disequilibrium, nor a sharpTable3-1.PublisheddataofmarkersshowingnonrandomallelicassociationswithHuntingtondisease.MARKERPOPULATIONNo.HDNo.CONTROLSIGNIFICANCEREFERENCE(LOCUS)CHROMOSOMESCHROMOSOMES(PVALUE)pk082(D4S1O)-HindHIItalian391740.00942(D4S180)-BamHIW.European561170.007738BJ56(D4S127)-PvuIIWEuropean581280.0003738Italian191290.01442BJ56(D4S127)--StuIW.European551220.01738674(D4S95)-AccIUK41970.001837multipleorigins672750.00541multipleorigins1044750.001639W.European511120.03238674(D4S95)-MboIUK51900.001137multipleorigins511880.00741multipleorigins973700.0000039Scottish532780.0009540(D45182)-EcoT22WEuropean531110.01238S7(D4S43)-HindllWEuropean591120.03438S1.5(D4S43)-TaqIItalian211210.02142731(D4598)-SstIUK531020.0002937multipleorigins24920.02641multipleorigins1063980.15839Scottish562600.004340W.European721530.1138Figure3-1.Aschematicmapof4pl6.3showingapproximatedistancesbetweenmarkers(compiledfromChpt2.references4to13).(P8)(G8)D4S62D4S1O(678)D4S96(157)(E2&4.2)D4S111D4S227D4S228(D51D4S901-I6Mb(674)D4S1BOD4S95(S7)D4S43D4S1680II-II-I_I4cenD4S125(BJ56)D4S182(731)(BS1)(2R3)4pterD4S127D4S98D4S133D4S1412IIII//III34544increase suggestive of the region of the gene. In addition, at the time of this study, thevalues of association were relatively weak, suggesting markers used were still somedistance from the site of mutation, and factors such as a significant recombination rateclouded the analysis.The small sample sizes used in some of the previous studies necessitated re-examination ina larger data set to either confirm or refute the previous fmdings. The aim of this study wasto analyze a large number of markers spanning the 6 Mb candidate region between D4S62and D4S90, using a large cohort of affected indiviuals and controls, most of UK descent torefine the most likely candidate regions. Furthermore, newly identified distal markerswere included in this study to add to the understanding of the pattern of association acrossthe region and aid in the localization of the FID gene.3.2 RESULTS3.2.1 IDENTIFICATION OF NEW DISTAL POLYMORPHIC PROBESA 460 kb segment of contiguously overlapping DNA had been previously constructedacross the distal candidate region from D4S 111 to D4S22843. Random fragments fromone distal cosmid containing the recombination breakpoint in the sporadic patientpreviously mentioned were isolated and hybridized to panels of six unrelated individualsfrom at least 10 restriction enzymes. Testing of approximately 10 random fragmentsidentified 3 unique and polymorphic probes from within the distal candidate region for theHI) gene (Table 3-2). BS1, a single copy 3.3 kb subclone, detects an MboI polymorphismat the previously identified D4S 133 1ocus’.The allele frequencies have been calculated by assessment of unaffected chromosomes fromfamilies with HI) as determined for nonrandom allelic association analysis. On the basis ofthe allele frequencies, the heterozygote frequency and the PlC was calculated for eachTable3-2.Heterozygosityofthreenewprobesforallelicassociationstudies.LOCUSPROBENAMEENZYMEALLELEALLELESIZENO.OFALLELEHETEROZYGOTEPIC(kb)CHROMOSOMESFREQUENCYFREQUENCYD4S228cl6D/E2RepE2SstI0540.00614.85710.8650.240.2224.5170.02633.8680.103D4S228cl6Dp/M4.24.2SstlA8800.150.260.22B6.54440.85D4S133cl6Dp/BS3.3BS1MboIA0.8650.160.260.22B0.73450.84*Sour:Botstein,D,WhiteRL,SkolnickM,DavisRW(1980).Constructionofageneticlinkagemapinmanusingrestrictionfragmentlengthpolymorphisms.AmJHumGenet32:314-331.46RFLP45 (Table 3-2). All are slightly informative, with PlC values of 0.22. E2 is likely tobe an insertion/deletion polymorphism, as it is detected by several other restrictionenzymes. The enzyme used for analysis in this chapter is SstL BS1 and 4.2 appear to bedue to a single base-pair substitution at their respective restriction recognition sites.3.2.2 NONRANDOM ALLELIC ASSOCIATION ACROSS 6 MBA total of 134 families with HI) were used for the analysis, providing 860 chromosomesfrom 134 affected and 313 unaffected individuals.Control chromosomes were obtained from unaffected spouses, and from the canonicalunaffected chromosome from affected patients. For this analysis it was assumed thatunaffected spouses were of similar ancestry to their affected partner.A total of 17 RFLPs detected by 13 probes were examined. All polymorphisms used weredetected by probes previously characterized and are summarized in Table 3-3. Probes usedfor re-analysis of nonrandom association include: 2R3 (D4S 141), 678 (D4S96), 731(D4S98), 674 (D4S95), G8 (D4S1O) and P8 (D4S62). 157 (D4S111) and S7 (D4S43)had not been tested on this cohort before. Probes Rs3 (D4S227) and Ac2 (D4S227), BS 1(D4S 133), E2 (D4S228), and 4.2 (D4S228) are newly identified probes not previouslyexamined for association with HD. A physical map showing markers and approximatedistances between them is shown in Figure 3-1.The allele frequencies for RFLPs on HD chromosomes and control chromosomes andmeasures of allelic association for all 17 RFLPs analyzed are shown in Table 3-4. TheBonferroni procedure was used to adjust for the multiple comparisons46.The adjustedsignificance level is z = 1-(1-cC)/7,therefore a p value of 0.003 is considered significantin this analysis.47Table 3-3. Polymorphic markers used in association analysis.PROBE LOCUS ENZYME ALLELE SIZE(kb)P8 D4S62 Hind A 3.7B 4.6C 4.9D 5.3G8 D4S1O HindUl A 13.0/3.7B 13.0/4.9C 11.0/3.70 11.0/4.9EcoRI A 11.0B 9.0Bgll A 2.2B 2.1pBS674E-D (674) D4S95 AccI A 6.8B 1.5TaqI A 2.3B 1.75MboI A 1.3B 0.7/0.6S7 D4S43 TaqI A 3.4B 2.3pBS731B-C (731) 04898 SstI A 2.8B 2.5C 10.0pBS678 (678) D4S96 MspI A 1.4B 1.0p157-9 (157) D4S111 PstI A 1.9B 2.1C 2.20 2.3Rs3 D4S227 MspI A 0.75B 0.9Ac2 D4S227 AccI 1 2.02 1.83 1.7BS1 D4S133 MboI A 0.8B 0.7cl6Dp/E2Rep (E2) D4S228 SstI 0 5.01 4.82 4.53 3.8cl6Dp/M4.2 (4.2) D4S228 SstI A 8.0B 6.52R3 D4S141 HindJII A 2.5B 1.6Table3-4.AllelefrequenciesforRFLPsonHDandnon-HI)chromosomes.HDNON-HDMARKERENZYMEALLELECHROMOSOMESCHROMOSOMESrPIQI(LOCUS)No%No%P8(D4S62)HincilA6773.632976.0.021.23.63.06B1415.46214.3C1011.0399.0D00.030.791100.0433100.0G8(D4S1O)HindIllA6275.627670.4.044.92.34.13B56.1389.7C1214.67017.9D33.782.082100.0392100.0EcoRIA3138.318248.5.0792.85.091.21B5061.719351.581100.0375100.0BgllA4756.624862.9.0471.05.31.13B3643.414637.183100.0394100.0674(D4S95)AccIA9181.234867.7.118.14.0043.35B2118.816632.3112100.0514100.0TaqIA3834.518936.0.011.077.78.03B7265.533664.0110100.0525100.0MboIA9380.931963.9.1412.0.00053.41B2219.118036.1115100.0499100.0S7(D4S43)TaqIA710.16716.8.0621.80.18.28B6289.933183.269100.0398100.0731(D4S98)SstIA2118.17314.4.032.64.42.11B9581.943184.8C00.040.8116100.0508100.0Table3-4.Continued.HDNON-HDMARKERENZYMEALLELECHROMOSOMESCHROMOSOMESrx2PIQI(LOCUS)%678(D4S96)MspIA6255.928355.2.013.11.74.01B4944.123044.8111100.0513100.0157(D4S111)PstIA1012.74512.3.018.14.71.03B4962.022160.4C1924.09726.5D11.320.579100.0366100.0Rs3(D4S227)MspIA1062.54564.3.0085.0062.94.04B637.52535.716100.070100.0Ac2(D4S227)AccI11666.75556.1.082.82.37.222729.23535.7314.188.224100.098100.0BS1(D4S133)MboIA44.16515.9.139.0.0027.63B9495.934584.198100.0410100.0E2(D4S228)SstI000.030.6.128.58.0034.58110595.544784.7221.8152.8332.76311.9110100.0528100.04.2(D4S228)SstIA65.57417.4.1410.3.0013.57B10394.535082.6109100.0424100.02R3(D4S141)HindlilA1134.49540.8.038.37.54.14B2165.613859.232100.0233100.0AllPvaluescalculatedwith1degreeoffreedom(ldf).Calculationsformulti-allelicmarkersweredonewiththemajorallelevs.pooledminoralleles.50The allele frequencies for the non-I-ID chromosomes concur with previously reportedresults37’941.The previously reported nonrandom association between the I-ID gene andthe RFLPs detected with D4S95 (AccI and MboI) is confirmed in this analysis39’41.Threenewly tested probes [BS1 (D4S 133), E2 and 4.2 (D4S228)] detect RFLPs that demonstratestatistically significant nonrandom association with the HD gene as shown in Table 3-4.All three of the markers are tightly linked within a 30 kb region (D4S133/D4S228). Allother markers tested show random association with HD (Table 3-4). The measure ofdisequilibrium as determined by the significance of the r value is consistent with the Yule’scoefficent, with a greater significance level corresponding to a greater Yule’s coefficient.For multi-allelic markers, calculations were done with the major allele vs. pooled minoralleles15. Pooling of alleles for linkage disequilibrium analysis may alter the power ofdetecting linkage disequilibrium. A second approach was therefore taken to determinelinkage disequilibrium between mukiallelic loci using a summation approach35. In thisparticular analysis, the power of detecting linkage disequilibrium showed some differencesdepending on the method used (Table 3-5). For some markers p values were increased andfor others, decreased. This raises caution in the interpretation of data of only marginalsignificance with either method.3.2.3 HAPLOTYPE ANALYSISAnother significant concern in the interpretation of data in Table 3-4 is the fact that 17 testshave been used which increases the type 1 error. Taking that into account, the appropriatep value chosen for significance would be 0.003. In an effort to further analyze the level ofsignificance of these results, those regions which showed significant results in Table 3-4 orhad results which had marginal significance, were further analysed using haplotypeanalysis. In this instance, the haplotypes were analyzed for allele association and theresults are shown in Table 3-6. The haplotypes constructed by a combination of the AccITable3-5.Comparisonof2methodsfordeterminationofnonrandomallelicassociationformulti-allelicRFLPs.MARKERENZYMENo.OFA)POOLEDALLELEMETHODB)SUMMATIONMETHOD(LOCUS)ALLELESrdfPdfPP8(D4S62)HinclI4.021.231.631.183.76G8(D4S1O)HindIll4.044.921.342.213.53731(D4S98)SstI3.032.641.421.892.39157(D4S111)PstI4.018.141.710.623.89Ac2(D48227)AccI3.082.821.371.042.59E2(D4S228)SstI4.1208.581.003410.793.013df=degreeoffreedom‘-4Table3-6.HaplotypesofHDchromosomesandcontrolchromosomescalculatedwIththemajorhaplotypevs.otherhaplotypespooled).EXPECTEDOBSERVEDOBSERVEDEXPECTEDVSHOVSLOCUSHAPLOTYPECONTROLCONTROLHOOBSERVEDOBSERVEDCHROMOSOMESCHROMOSOMESCHROMOSOMESCONTROLCONTROL%%#%D4S95674/Acci674/MbolAA20843.329461.38780.5x2=30.2=13.6AB11724.4336.910.9df=1df=1BA9920.6132.721.9P=0.00000P=0.00023BB5611.714029.21816.7480100.0480100.0108100.0D4S228.l4.LZB1B14558.721085.08497.7x2=41.0=8.7otherhaplotypes10241.33715.022.3df=1df=I247100.0247100.086100.0P=0.00000P=0.0032D4S228E21A3813.610.422.3x2=19.9=7.11B19971.224386.88497.7df=1df=12A30.431.100.0P=0.00001P=0.00782B72.141.400.03A51.9269.300.03B2810.010.400.00A00.100.000.0oB10.520.600.0280100.0280100.086100.0D4S2284..2AA82.8259.211.2x2=22.8=7.1AB4014.651.800.0df=1df=1BA3613.162.211.2P=0.00000P=0.0078BB18969.523686.88498.6272100.0272100.086100.0D4S2284.21A5714.741.011.0x2=24.1=5.91B27170.032985.09394.9df=1dt=12A20.541.011.0P=0.00000P=.0152B92.341.000.03A82.1369.333.13B389.882.100.00A00.100.000.00B20.520.500.03870.0387100.098100.053and MboI alleles detected by marker D4S95, and the haplotypes resulting from thecombination of the BS 1 (D4S 133), E2 and 4.2 (D4S228) alleles both show significantnon-random association with HD, confirming the findings in Table 3-4.In addition, to investigate nonrandom association between markers within each cluster,pairwise haplotypes were analyzed (Table 3-6). The analysis of haplotypes demonstratesthat the AccI and MboI RFLPs at D4S95 are in very tight association with each other onHI) and non HD chromosomes as might be expected due to their physical proximity to oneanother. Additionally, the three RFLPs clustered within 30 kb around D4S228 andD4S 133 are also in strong association with each other on HI) and non lID chromosomes.In order to determine if the two loci demonstrating association with EU) are also inassociation with each other, haplotypes of the two loci were analyzed from both lID andcontrol chromosomes (Table 3-7). It is noteworthy that 674 (D4S95, AccI and Mbol) is instrong non-random association with the RFLPs detected at loci D4S228 and D4S 133 onND chromosomes but not on control chromosomes.3.2.4 ANALYSIS OF HOMOGENEOUS CHROMOSOMESThe families in this study represent many different ancestries and measures of nonrandomassociation may be affected by the potential of multiple origins for the HI) mutation onchromosomes with different haplotypes. Allelic counts using only canonical non-HI)chromosomes for controls were done to attempt to adjust for effects from differentpopulations. For markers tested with a sufficient number of individuals for analysis in thismanner, the measures of nonrandom allelic association are shown in Table 3-8. Althoughthe significance measured by the p value has decreased overall, due to the smaller data set,significant nonrandom association is still present for the RFLP at D4S228 (E2, SstI). Thesignificance of the association (p value) and the measure of association as calculated by theTable3-7.HaplotypesbetweenRFLPsatD4S95andD4S228/D4S133forHDandNon-HDchromosomes.EXPECTEDOBSERVEDOBSERVEDOBSERVEDHDVSHAPLOTYPECONTROLCONTROLHDVSOBSERVEDCHROMOSOMESCHROMOSOMECHROMOSOMESEXPECTEDCONTROLS#%%%(CONTROLS)674/AccI-BS1AB15656.916158.86175.3=0.12x 2=6.6BB7427.27928.81620.8df=1df=1AA3010.83010.933.7P=0.73P=0.010BA145.141.511.2274100.0274100.081100.0674/AccI-4.2AB21055.921858.07077.5=0.27=12.2BB10126.711029.21617.9df=1df=1AA4411.84110.933.4P=0.61P=0.00048BA215.671.901.1376100.0376100.089100.0674/AccI-E2A121857.321155.57676.82O19x 2 =13.9B110427.411229.51919.2df=1df=1A3318.1359.222.0P=0.66P=0.00019B3143.892.411.0A271.971.811.0B230.941.100.0A020.420.500.0B010.200.000.0380100.0380100.099100.0674fMboI-BS1AB15053.715555.66172.6=0.1249BB8530.49132.61720.2df=1df=1AA2810.2279.833.6P=0.73P=0.0077BA165.762.211.2279100.0279100.084100.0Table3-7Continued.EXPECTEDOBSERVEDOBSERVEDOBSERVEDHDVS.HAPLOTYPECONTROLCONTROLHDVS.OBSERVEDCHROMOSOMESCHROMOSOMESCHROMOSOMESEXPECTEDCONTROLS%%%CONTROLS674/MboI-4.2AB19752.820454.57075.30.19=12.4BB11129.812433.21920.4df=1df=1AA4211.1349.133.2P=0.66P=0.00044BA246.3123.211.1374100.0374100.093100.0674fMboI-E2A120954.120352.67174.0=0.13=13.5B111830.612632.62121.9df=1df=1A3297.6307.822.1P=0.72P=0.00024B3184.7143.611.0A271.861.511.0B241.051.300.0A000.010.300.0B010.210.300.0386100.0386100.096100.0df=degreeoffreedomx 2calculatedwithmajorhaplotypevs.pooledotherhaplotypesTable3-8.AllelefrequenciesforRFLPsonHDandcanonicalnon-HDchromosomes.CANONICALMARKERENZYMEALLELEHDNON-HDrPIQICHROMOSOMESCHROMOSOMES#%P8(D4S62)HincIlA6773.67279.10.0560.570.450.15B1415.499.9C1011.01011.0D00.000.091100.091100.008(D4SIO)HindlilA6275.65368.00.0801.020.31.19B56.1911.5C1214.61620.5D33.700.082100.078100.0EcoRIA3138.34250.60.122.40.12.25B5061.74149.481100.083100.0BgIIA4756.64857.80.020.070.79.03B3643.43542.283100.083100.0674(D4S95)AccIA9181.28775.60.424.00.046.17B2118.82824.4112100.0115100.0TaqIA3834.54238.80.060.810.37.09B7265.56661.1110100.0108100.0MboIA9380.97868.40.133.90.05.32B2219.13631.6115100.0114100.0731(D4S98)SstIA2118.11311.40.081.50.22.19B9581.99986.8C00.021.8116100.0114100.0Table3-8.Continued.CANONICALMARKERENZYMEALLELENDNON-NDrPIQICHROMOSOMESCHROMOSOMES#%#%678(D4S96)MspIA6255.95046.70.081.40.24.18B4944.15753.3111100.0107100.0157(D4SIII)PstIA1012.71215.60.040.26.61.05B4962.04659.7C1924.01924.7D11.300.079100.077100.0BS1(D4S133)MboIA44.11212.00.154.50.34.52B9495.98888.098100.0100100.0E2(D4S228)SstIA00.000.00.209.00.0027.57B10595.59885.2C21.810.9D32.71613.9110100.0115100.04.2(D4S228)SstIA65.51615.00.154.90.027.50B10394.59185.0109100.0107100.0AllPvaluescalculatedwithonedegreeoffreedom.Calculationsformulti-allelicmarkersweredonewiththemajorallelevs.pooledminoralleles.58Yule’s coefficient (QI) are both shown, confirming the presence of two physically distinctregions in nonrandom association with the HD gene.Another means of examining the strength of nonrandom allelic association within a morehomogeneous population was to study families of United Kingdom (UK) origin separately.Markers demonstrating significant nonrandom association [674 (D4S95), BS1 (D4S 133),E2 and 4.2 (D4S228)] were re-examined after categorization of the origin of HD in eachpedigree and these data are presented in Table 3-9. Pedigrees where the ancestry of the HDchromosome was unknown were not included in the analysis. Although the statisticalsignificance of the data is lost likely due to the smaller sample sizes, the measure ofnonrandom association as determined by the Yule’s coefficient is greater for this morehomogeneous population. The maximum Yule’s coefficient determined in this study wasobtained with the UK group tested with 4.2 (D4S228), (IQI=.78).3.3 DISCUSSIONThree polymorphic probes from D4S228 and D4S 133 were identified and cloned whichallowed for testing of probes in allelic association analysis in a previously untested regionof 4pl6.3. The focus on 4pl6.3 has resulted in the identification of genes such as the ciduronidase gene47, the B-subunit of the cyclic GMP phosphodiesterase gene48, and themyosin light chain49, and these markers may prove useful in future linkage studies forother diseases. In addition, in presymptomatic testing, informativeness of an analysisdepends on the frequency of heterozygosity of the DNA markers tested as well as thefamily structure and these markers may be useful for predictive testing for disease causedby genes in 4.pl6.3.The most important finding of this study is the discovery of significant nonrandomassociation between alleles for BS 1 (D4S 133), E2 (D4S228), 4.2 (D4S228) and HD, whichTable3-9.AllelefrequenciesandpercentagesforRFLPsonHDandnon-HDchromosomesbasedonancestryoftheHDchromosome.HDNON-HDMARKERENZYMEANCESTRYALLELECHROMOSOMESCHROMOSOMESrPIQI(LOCUS)%674(D4S95)AccIUKA3981.314868.50.123.80.051.33B918.76831.548100.0216100.0Non-UKA2573.510265.00.081.20.27.20B926.55535.034100.0157100.0MbolUKA3782.212660.60.177.30.0069.50B817.88239.445100.0208100.0Non-UKA2573.59868.50.0440.340.56.12B926.54531.534100.0143100.0BS1(D4S133)MboIUKA12.83215.20.133.80.051.73B3597.217884.836100.0210100.0Non-UKA26.23316.50.102.20.14.50B3093.816783.532100.0200100.0E2(D4S228)SstIUKA00.020.750.124.50.034.75B4397.723085.8C00.020.75D12.33412.744100.0268100.0Non-UKA00.010.40.092.20.14.48B2993.621783.5C13.2135.0D13.22911.131100.0260100.04.2(D4S228)SstIUKA12.33116.40.155.20.022.78B4297.715883.643100.0189100.0Non-UKA38.84318.30.0882.10.15.40B3191.219281.734100.0235100.0Allpvaluescalculatedwithonedegreeoffreedom.Calculationsformulti-allelicmarkersweredonewiththemajorallelevs.pooledminoralleles.60are separated by approximately 3 Mb from the site of previously identified nonrandomassociation detected by D4S95 (Fig. 3-1). In addition, previously reported association withalleles for AccI and MboI polymorphisms at D4S95 was confirmed with this larger data set.All other marker alleles tested were shown to be in random association with the HD gene.Therefore, two physically distinct regions (D4S 1331D4S228 and D4S95), both containingmarkers in significant nonrandom association with the HI) gene have been identified.It should be noted that although reasonable steps to perform a rigorous statistical analysishave been taken, these statistics may be altered by varying frequency of alleles in controlsfrom different populations. In order to examine a more homogeneous population, familieswith an HD allele of UK origin were analyzed separately, confirming the presence of tworegions in linkage disequilibrium with HD. In addition, analysis of HD chromosomes andthe patient’s canonical chromosome from the unaffected parent as controls also showed twogenomic regions in disequilibrium with HO. However, the assumption that controls are ofsimilar ancestry to that of their affected partner may not be accurate, and measures of thedifferences in allele frequencies between HO and controls may therefore not reflect the truestate. Thus, the inability to accurately control for the ancestry of the control chromosomesmay be reflected in the measures of disequilibrium.Nonrandom association between two loci, separated by markers which do not showsignificant association, has been previously reported. However, the distance between theseloci has been relatively small. For example, RFLPs from the Apo AI-CllI-AIV gene clusterwhich are between 4—23 kb apart, are in strong linkage disequilibrium with each other but inequilibrium with RFLPs between them23’9. Similar findings were reported for the f3-globingene cluster which contains two clusters of RFLPs separated by 9 kb with disequilibriumbetween RFLPs within each cluster but with no significant disequilibrium between the twoclusters in a normal population50.The findings of this manuscript reveal two loci, separated61by 3 Mb of DNA, each showing nonrandom association with a third locus, in this case theHI) gene. In addition, haplotype analysis demonstrated that the two clusters of markers arein significant allelic association with each other on HI) but not control chromosomes.The measures of association between D4S95 and D4S 1 331D4S228 show that alleles at thetwo loci, separated by 3Mb are in strong association with HI), and in association with eachother on affected but not control chromosomes. If these results are not a statistical artifact,the reasons for two regions of disequilibrium separated by a large genomic distancecontaining additional markers not in disequilibrium with the HD gene, are unknown. Inaddition, the significance of markers in the two distinct regions being in disequilbrium witheach other on RD chromosomes, but not normal chromosomes also remains unclear. Thesequestion remains unanswered despite the identification and localization of the CAGtrinucleotide repeat whose expansion is associated with HD.The localization of the mutation associated with HI) has allowed for retrospective analysis ofthis data and reassessment of previous association data. The strong linkage disequilibriumbetween RD and D4S95 (AccI and MboI polymorphisms) observed in all populations testedis not surprising due to its close proximity to the RD mutation (120kb). However, themarker D4S 127 demonstrated slightly weaker association with HI)38 despite being only 30kbfrom the CAG repeat in this region. This may be due to the different nature of thepolymorphism, with a multi-allelic system such as D4S 127 having a higher mutation ratemaking it more difficult to identify association, or may be a reflection of the populationtested. The weak association seen with markers D4S 180 and D4S 18238 is in keeping withtheir location 250kb centromeric and 300kb respectively to the RD mutation. Theassociations observed with D4S43, D4S98 and D4S 133/D4S228 and RD in several differentpopulations is still unexplained as these markers are located 900 kb, 1400kb and 3Mbrespectively from the CAG repeat.62It is possible that expansion of the CAG repeat to the Huntington range is a relatively recentevent, and disequilibrium between two distal regions is the result of infrequentrecombination between these two loci on Huntington chromosomes compared to normalchromosomes. This is consistent with a prior finding of a low recombination rate in thisregion on affected but not control chromosomes51.However, the tracing of families withND, even to the 17th century1’is more in keeping with older origins for the HD gene.Other as yet undefined selective pressures might also be influencing these findings. Theremay be some unknown slight selective advantage of association of alleles for allchromosomes, which is seen more strongly on affected chromosomes than controlchromosomes because of a small number of ancestral Huntington chromosomes.Specific sequences play a role in expansion of the CAG repeat and there may be someassociation between distal sequences and repeat expansion. For example, in SCA,sequence specificity, in particular the homogeneity of the repeat sequence, is associatedwith expansion to disease state. The occurrence of specific DM and FRAXA haplotypesalso suggests that sequence differences may be very important52. If sequences distallylocated are involved in the instability of the CAG repeat on HD chromsomes, this mayaccount for the associations observed distally on ED chromosomes.One hypothesis prior to the cloning of the ND gene was that the causative gene may beextremely large, with exons spanning 3 Mb. The gene associated with the CAG repeatexpanded on HD chromosomes is large, spanning over 200 kb, however it is localizedbetween D4S 180 and D4S 127. The possibility, however, that an important regulatorysequence on the ED chromosome physically separated from the ND gene over an extensivedistance could account for the distal region of disequilibrium has yet to be definitivelyexcluded.63Another hypothesis explaining two sites of association with HD is that the disease is theresult of mutations in two independent but functionally related genes 3 Mb apart, withmutations at each location responsible for manifestation of the disease. This is unlikely,however, as CAG expansion occurs in 99% of affected individuals and linkagedisequilibrium with a second locus would be undetectable with the small number ofindividuals not demonstrating CAG expansion.The finding of linkage disequilibrium between two regions separated by 3 Mb but with theabsence of linkage disequilibrium between other markers in closer proximity to these markersat D4S95, D4S228 and D4S 133 again raises concerns about the use of linkage disequilibriumin determining locus order. It has previously been suggested that loci located more distantlyfrom one another may undergo recombination more frequently and therefore would beexpected to demonstrate less linkage disequilibrium than more closely linked loci’2.However, the findings of this study, would suggest that no definite relationship ispredictably found between physical distance and modest measures of linkage disequilibrium.It has previously been demonstrated that this region of chromosome 4 might have decreasedrecombination51and lack of a constant recombination rate across the region may interferewith the relationship between distance and measure of allelic association. Other factors suchas mutation, admixture and drift may would also disturb the expected relationship betweenphysical distance and degree of linkage disequilibrium. These findings suggest that a linearrelationship between disequilibrium and physical distance may not exist in this chromosomalregion in the population tested.In conclusion, it has previously been cautioned that measures of disequilibrium cannot becorrelated with physical distance between loci30. The two regions of nonrandom associationshown in this study made it difficult to define precisely a location for the HD gene. Since the64identification of the expanding CAG trinucleotide repeat in a novel gene, 120 kb fromD4S95, the two regions of disequilibrium are as of yet unexplained. Unless the results arean artifact due to an inaccurate assumption that the control chromosomes reflect thepopulation from which the HD chromosomes were derived, the most likely hypothesis toaccount for the findings in this manuscript is that other factors, including selection andmutation are acting in a yet unknown fashion.653.4 REFERENCES1. Lewontin RC and Kojima KI (1960). The evolutionary dynamics of complexpolymorphisms. Evolution 14:458-472.2. Weir BS (1990). Genetic Data Analysis. Sinauer, Sunderland, MA.3. Lewontin RC (1974). The genetic basis of evolutionaiy change. Columbia UniversityPress, New York.4. Kan YW and Dozy AM (1978). Polymorphism of DNA sequence adjacent to humanbeta globin structural gene: relationship to sickle mutation. Proc Natl Acad Sci USA75:5631-5635.5. Fujita R, Hanauer A, Sirugo G, Heilig R, Mandel JL (1990). Additionalpolymorphisms at marker loci D9S5 and D9S 15 generate extended haplotypes in linkagedisequilibrium with Friedreich ataxia. Proc Nati Acad Sci USA 87:1796-1800.6. Harley HG, Brook ID, Floyd 3, Rundle SA, Crow 5, Walsh KV, Thibault MC, HarperPS, Shaw DJ (1991). Detection of linkage disequilibrium between the myotonic dystrophylocus and a new polymorphic DNA marker. Am 3 Hum Genet 49:68-75.7. Thomas GR, Roberts EA, Rosales TO, Moroz SP, Lambert MA, Wong LTK, Cox DW.(1993). Allelic association and linkage studies in Wilson disease. Hum Mol Gen 2:1401-1405.8. Kerem, BS, Rommens 3M, Buchanan JA, Markiewicz D, Cox TK, Chakravarti A,Buchwald M, Tsui L-C (1989). Identification of the cystic fibrosis gene: genetic analysis.Science 245:1073-1080.9. Hastbacka, de la Chapelle, Kaitila I, Sistonen P, Weaver A, and Lannder E (1992).Linkage disequilibrium mapping in isolated founder populations: diastrophic dysplasia inFinland. Nature Genet 2:204-211.10. Conneally PM , Haines IL, Tanzi RE, Wexler NS, Penchaszadeh GK, Harper PS,Folstein SE, Cassiman JJ, Myers RH, Young AB, Hayden MR, Falek A, Tolosa ES,Crespi 5, Maio L, Holmgren G, Anvret M, Kanazawa I, Gusella (1989). Huntingtondisease: No evidence for locus heterogeneity. Genomics 5:304-308.11. Hayden MR. (1981). Huntington’s chorea. Springer-Verlag, New York.12. Hill WG and Robertson A (1968). Linkage disequilibrium in finite populations. Theor.Appl. Genet 38:226-23 1.13. Sved JA (1971). Linkage disequilibrium and homozygosity of chromosome segmentsin finite populations. Theor Popul Biol 2:125-141.14. Steinmetz M, Uematsu Y, Lindahl KF (1987). Hotspots of homologousrecombination in mammalian genomes. Trends Genet 3:7-10.6615. Litt M and Jorde B (1986). Linkage disequilibrium between pairs of loci within ahighly polymorphic region of chromosome 2q. Am 3 Hum Genet 39:166-178.16. Bech-Hanson NT, Linsley PS, Cox DW (1983) Restriction fragment lengthpolymorphisms associated with immunoglobin Cy genes reveal linkage disequilibrium andgenomic organization. Proc Natl Acad Sci USA 80:6952-6956.17. Aschbacher A, Buetow K, Chung D, Walsh S, Murray 3 (1985). Linkagedisequilibrium of RFLP’s associated with the a,f3, and y fibrinogen predict gene order onchromosome 4. Am 3 Hum Genet Suppi 37:A186.18. Chakravarti A, Albein SC, Permutt MA (1986). Evidence for increased recombinationnear the human insulin gene: implications for disease association studies. Proc Natl AcadSci USA 83:1045-1049.19. Chakraborty R, Lidsky AS, Daiger SP, Guttler F, Sullivan S, Dililla AG, Woo SLC(1987). Polymorphic DNA haplotypes at the human phenylalanine hydroxylase locus andtheir relationship with phenyilcetonuria. Hum Genet 76:40-46.20. Daiger SP, Chakraborty R, Reed L, Fekete G, Schuler D, Berenssi G, Nasz I, BrclickaR, Kamaryt J, Pijackova A, Moore S, Sullivan S, Woo SLC (1989). Polymorphic DNAhaplotypes at the phenylalanine hydroxylase (PAH) locus in European families withphenylketonuria (PKU). Am 3 Hum Genet 45:310-318.21. LeitersdorfE, Chakravarti A, Hobbs HE (1989) Polymorphic DNA haplotypes at theLDL receptor locus. Am 3 Hum Genet 44:409-421.22. Elbein SC (1992). Linkage disequilibrium among RFLPs at the insulin-receptor locusdespite intervening Alu repeat sequences. Am 3 Hum Genet 51:1103-1110.23. Thompson EA, Deeb S, Walker D, Motulsky AG (1988). The detection of linkagedisequilibrium between closely linked markers: RFLPs at the Al-Cm apolipoprotein genes.Am 3 Hum Genet 42:113-124.24. Barker Holm T, White R (1984). A locus on chromosome lip with multiplerestriction site polymorphisms. Am 3 Hum Genet 36:1159-1171.25. Chakravarti A, Phillips JA, Mellits KH, Buetow KH, Seeburg PH (1984). Patterns ofpolymorphism and linkage disequilibrium suggest independent origins of the humangrowth hormone gene cluster. Proc Natl Acad Sd USA 8 1:6085-6089.26. Borrensen Al, Moller P, Berg K (1988). Linkage disequilibrium analyses andrestriction mapping of four RFLPs at the procx2(I) collagen locus: lack of correlationbetween disequilibrium and physical distance. Hum Genet 78:216-221.27. Tzall S, Ellenbogen A, Eng F, Hirschhorn R (1989). Identification andcharacterization of none RFLPs at the adenosine deaminase (ADA) locus. Am 3 HumGenet 44:864-875.28. Hegele RA, Plaetke R, Lalouel JM (1990). Linkage disequilibrium between DNAmarkers at the low-density lipoprotein receptor gene. Genet Epedemiol 7:69-8 1.6729. Benlian P, Boileau C, Loux N, PAstier D, Masliah J, Coulson M, Nigou M, Ragab A,Guimard 3, Ruidavets TB, Bonaiti-Pellie C, Fruchart JC, Douste-Blazy P, Bereziat G,Junien C (1991). Extended haplotypes and linkage disequilibrium between 11 markers atthe APOA1-C3-A4 gene cluster on chromosome 11. Am I Hum Genet 48 :903-910.30. Haviland MB, Kessling AM, Davignon J, Sing CF (1991). Estimation of Hardy-Weinberg and pairwise disequilibrium in the apolipoprotein AT-Cm-API gene cluster. AmJ Hum Genet 49:350-365.31. Walter MA and Cox DW (1991). Nonuniform linkage disequilibrium within a 1,500-kb region of the human immunoglobin heavy-chain complex. Am J Hum Genet 49:9 17-931.32. Zerba KE, Kessling AM, Davignon I, Sing CF (1991). Genetic structure and thesearch for genotype-phenotype relationships: an example from disequilibrium in the Apo Bgene region. Genetics 129:525-533.33. Miserez AR, Schuster H, Chiodetti N, Keller U (1993). Polymorphic haplotypes andrecombination rates at the LDL receptor gene locus in subjects with and without familialhypercholesteremia who are from different populations. Am J Hum Genet 52:808-826.34. Jorde LB, Watkins WS, Viskochil D, O’Connell P, Ward K (1993) Linkagedisequilibrium in the neurofibromatosis I (NFl) region: Implications for gene mapping.Am J Hum Genet 53:1038-1050.35. Weir BS and Cockerham CC (1978). Testing hypothesis about linkage disequilibriumwith multiple alleles. Genetics 88:633-642.36. Hill WG (1975). Linkage disequilibrium among multiple neutral alleles provided bymutation in fmite populations. Theor Popul Biol 8:117-126.37. Snell RG, Larazou L, Youngman S, Quarell OWJ, Wasmuth JJ, Shaw DJ, Harper PS(1989). Linkage disequilibrium in Huntington’s disease: an improved localization for thegene. 3 Med Genet 42:673-675.38. MacDonald ME, Lin C, Srinidhi L, Bates G, Altherr M, Whaley WL, Lehrach H,Wasmuth J, Gusella iF (1991). Complex patterns of linkage disequilibrium in theHuntington disease region. Am 3 Hum Genet 49:723-734.39. Adam S, Theilmann J, Buetow K, Hedrick A, Collinc C, Weber B, Huggins, HaydenM (1991). Linkage disequilibrium and modification of risk for Huntington disease. Am JHum Genet 48:595-603.40. Barron L, Curtis A, Shrimpton AE, Holloway 5, May H, Snell RG, Brock DIR(1991). Linkage disequilibrium and recombination make a telomeric site for theHuntington’s disease gene unlikely. 3 Med Genet 28:520-522.41. Theilmann 3, Kanani S, Shiang R, Robbins C, Quarrell 0, Huggins M, Hedrick A,Weber B, Collins C, Wasmuth JJ, Buetow KH, Murray JC, Hayden MR (1989).Nonrandom allelic association between alleles detected at D4S95 and D4S98 and theHuntington disease gene. 3 Med Genet 26:676-68 1.6842. Novelletto A, Mandich P, Bellone E, Malaspina P, Vivona G, Ajmar F, Frontali M(1991). Non-random association between DNA markers and Huntington disease locus inthe Italian population. Am 3 Med Genet 40:374-376.43. Weber, B, Collins C, Kowbel D, Riess 0, Hayden MR (1991). Identification ofmultiple CpG-islands and associated conserved sequences in a candidate region for theHuntington disease gene. Genomics 11:1113-1124.44. Pritchard CA, Casher D, Uglum E, Cox DR, Myers RM (1989). Isolation and field-inversion gel electrophoresis analysis of DNA markers close to the Huntington diseasegene. Genomics 4:408-418.45. Botstein D, White RL, Skolnick M, Davis RW (1980). Construction of a geneticlinkage map in man using restriction fragment length polymorphisms. Am 3 Hum Genet32:3 14-33 1.46. Weir BS (1990) Genetic Data Analysis. Sinauer, Sunderland, MA.47. Scott HS, Ashton U, Eyre HJ, Baker E, Brooks DA, Callen DF, Sutherland GR,Morris CP, Hopwood JJ (1990). Chromosomal localization of the human a-L-iduronidasegene (IDUA) to 4p16.3. Am 3 Hum Genet 47:802-807.48. Weber B, Riess 0, Hutchinson G, Collins C, Lin B, Kowbel D, Andrew S, ShappertK, Hayden MR (1991) Genomic organization and complete sequence of the human geneencoding the 8-subunit of the cGMP phosphodiesterase and its location to 4.pl6.3 NuclAcids Res. 19:6263-6268.49. Collins C, Shappert K, Hayden MR (1992). The genomic organization of a novelregulatory myosin light chain (MYL5) that maps to chromosome 4p16.3 and showsdifferent patterns of expression between primates. Hum Mol Genet 1:727-733.50 Chakravarti A, Buetow KH, Antonarakis SE, Waber PG, Boehm CD, Kazazian HH(1984). Nonuniform recombination within the human 8-globin gene cluster. Am 3 HumGenet 36:1239-1258.51. Buetow KH, Shiang R, Yang P. Nakamura Y, Lathrop GM, White R, Wasmuth JJ,Wood 5, Berdahl LD, Leysens NJ, Ritty TM, Wise M, Murray JC (1991). A detailedmultipoint map of human chromosome 4 provides evidence for linkage heterogeneity andposition-specific recombination rates. Am J Hum Genet 48:911-925.52. Chakravarti A (1992). Fragile X founder effect? Nature Genet 1:237-238.69CHAPTER 4DNA ANALYSIS OF DISTINCT POPULATIONSThe work presented in this chapter has contributed to two publications.Andrew SE, Theilmann J, Almqvist E, Norremolle A, Lucotte G, Anvret M, SorensonSA, Turpin JC, Hayden MR (1993). DNA analysis of distinct populations suggestsmultiple origins for the mutation causing Huntington disease. Clin Genet 43 :286-294.Almqvist E, Andrew SE, Theilmann 3, Goldberg P, Zeisler 3, Drugge U, Grandell U,Tapper-Persson M, Winbiad B, Hayden MR, Anvret M. Geographical distribution ofhaplotypes in Swedish families with Huntington disease (In press, Human Genetics).704.1 INTRODUCTIONIn an effort to refine the localization of the HD gene, studies of linkage disequilibrium havebeen undertaken in populations of various ancestries with DNA markers from 4pl6.3. Atotal of seven allelic association studies have been reported revealing significantassociations between various markers and HD in some populations and not in others (Table3-1). Although some markers appear to be nonrandomly associated in most populations(i.e. D4S95)’-5, conflicting results supporting and refuting nonrandom association betweenD4S1O, D4S43, D4S98 and the lID gene have been reported16. In Chapter 3, using apopulation of mixed ancestry and a UK population, a distal region of disequilibrium wasidentified at D4S 133/D4S228, about 3 Mb distal to D4S95.Results of association studies can be significantly biased if the ancestry of the controlpopulation is not similar to that of the affected population. In addition to considering ethnicspecific allele frequencies, if a disease is caused by more than one mutation at the samelocus, different mutations associated with different DNA haplotypes might predominate invarious populations. Associations due to identity by descent would be stronger byanalysing descendants of a few common ancestors rather than by pooling all persons withdifferent mutations7.This argues for studies of nonrandom association in selected distinctpopulations where controls and affected individuals are more likely to be of similar ancestryand where there might be only a few or perhaps a single mutation underlying HD in thatregion. For disequilibrium mapping, a judicious choice of isolated, well-definedpopulations with well characterized histories may be essential to decrease the influence ofevolutionary forces such as drift and admixture8.This is successfully demonstrated by thesearch for the mutation causing diastrophic dysplasia (DTD), where linkage disequilibriumin the isolated Finnish population of DTD families, all descended from one founderchromosome, was used to localize the likely location of the DTD gene to a region ofapproximately 60 kb, from a previously estimated region of 1.6 Mb9.71Another method of analysis of presumed homogeneous populations is to establishhaplotypes for affected chromosomes. Development of extended RD haplotypes can leadto estimates of the minimum number of original mutations in a particular geographicalregion or population.To date, there have been few HD association studies in populations of clearly definedancestry. One association study was based on 46 pedigrees of Italian ancestry6. Twostudies have been based on HD families of UK descent2’4. In these studies however,unaffected family members and others were included as controls without stringentlydetermining their ancestral background. Three association studies have been based onNorth American populations1’35where it is especially inappropriate to assume that thespouse of the affected person is of similar ancestry. Therefore, these studies are likely tohave included controls with diverse ancestral backgrounds which might have confoundedthe results.In this analysis, nonallelic association between HI) and several markers previously shownto be in association with HD, is investigated in three distinct populations of French,Swedish and Danish ancestry and results have been compared to a similar sized group oflID families of UK descent. Haplotypes of HD chromosomes were constructed withineach population to determine the number of independent mutations for the HD gene in theFrench, Swedish and Danish populations.4.2 RESULTS4.2.1 ASSESSMENT OF NONRANDOM ASSOCIATIONFor analysis of linkage disequilibrium in homogeneous populations DNA was obtainedfrom a total of 276 individuals living in France, Denmark and Sweden. A total of 149 ND72samples and 127 non-HD samples representing 75 unrelated families were collected.Ninety eight samples from 26 HD pedigrees were obtained from Sweden, 121 samplesfrom 18 pedigrees were Danish in origin, and 57 samples from 31 pedigrees were from theReims region of France. Samples were carefully selected in each of these counthes suchthat all individuals within each population were of the same ancestral background, with thecontrols of the same background as the affected individuals.In addition, HI) and non-HI) samples of a UK population, with autochthonic controlchromosomes, was used for comparison to the other populations. This sample wasobtained by selecting HI) and non-HD individuals randomly from the UK cohort ofChapter 3 until 20 HD individuals were obtained, resulting in a group with a similarnumber of HD and control chromosomes to that of the other populations tested in thisstudy. Affected and non-HI) individuals from a population of multiple different ancestrieswas also assessed for comparison.Four DNA markers demonstrated in Chapter 3 to be in nonrandom allelic association withHI) were used for analysis of the homogeneous Danish, Swedish, and French populationsin this Chapter (Table 4-1).The presence of significant allelic association was tested using a X2 test of homogeneity.For the multiallelic marker E2 (D4S228), alleles beyond the most common allele werepooled to form one class. The x2 statistic is dependent on sample size, therefore the Yulecoefficient was also calculated as another measure of association.The allele frequencies for RFLPs on HI) and control chromosomes and the measures ofallelic association are shown in Table 4-2. The statistically significant nonrandomassociation previously observed between the HD gene and D4S95 (674 MboI73Table 4-1. Polymorphic markers used in association analysis.PROBE LOCUS ENZYME ALLELE SIZE(kb)pBS674E-D (674) D4S95 MboI A 1.3B 0.7/0.6BS1 D4S133 MboI A 0.8B 0.7cl6Dp/E2Rep (E2) D4S228 SstI 0 5.01 4.82 4.53 3.8cl6Dp/M4.2 (4.2) D4S228 SstI A 8.0B 6.5Table4-2.AllelefrequenciesforRFLPsonHDandNon-HDchromosomes.MARKERHDNON-HD(LOCUS)ENZYMEANCESTRYALLELECHROMOSOMESCHROMOSOMESx 2dfPIQI##%674(D4S95)MboIFrenchA1588.23967.21.9310.170.57B211.81932.8Total17100.058100.0SwedishA1372.24762.70.2410.630.22B5.27.82837.3Total18100.075100.0DanishA952.98860.70.1310.720.16B847.15739.3Total17100.0145100.0U.K.A1785.04556.24.4610.0350.63B315.03543.8Total20100.080100.0MultipleA9380.931963.911.4010.000740.41NAncestryB2219.118036.1Total115100.0499100.0BS1(D45133)MboIFrenchA210.01221.40.6310.430.42B1890.04478.6Total20100.056100.0SwedishA416.7911.70.08210.770.20B2083.36888.3Total24100.077100.0DanishA17.12318.30.4510.500.49B1392.910381.7Total14100.0126100.0U.K.A00.01219.73.1910.074-B20100.04980.3Total20100.061100.0MultipleA44.16515.98.3610.00380.63AncestryB9495.934584.1Total98100.0410100.0E2(D4S228)SstIFrench000.011.91.1820.550.5011794.44584.90.4110.52200.000.0315.6713.2Total18100.053100.0Table4-2.Continued.MARKERHDNON-HD(LOCUS)ENZYMEALLELECHROMOSOMESCHROMOSOMESx2dfPIQIANCESTRY%%E2(D4S228)SstISwedish000.011.31.6330.650.1512191.37088.60.0002210.99214.311.3314.378.9Total23100.079100.0Danish000.000.01.9320.380.5911593.810579.61.0710.30200.043.0316.22317.4Total16100.0132100.0U.K.000.000.03.5420.17120100.07184.52.2610.13200.022.4300.01113.1Total20100.084100.0Multiple000.030.69.6630.0220.58Ancestry110595.544784.78.1910.0042221.8152.8332.76311.9Total110100.0528100.04.2(D4S228)SstIFrenchA16.3512.80.0510.810.38B1593.73487.2Total16100.039100.0SwedishA28.7810.30.0310.860.09B2191.37089.7Total23.100.078100.0DanishA16.22821.21.1910.280.60B1593.810478.8Total16100.0132100.0U.K.A00.01421.53.7110.05B20100.05178.5Total20100.065100.0MultipleA65.57417.48.7910.00300.57AncestryB10394.535082.6Total109100.0424100.076polymorphism)’5or between HD and D4S 1331D4S228 (BS 1 MboI polymorphism, E2SstI polymorphism and 4.2 SstI polymorphism) is not observed in this study. None of theP values obtained with these markers are significant at P <0.05 within the populationstested (Table 4-2).For comparison, a UK population of similar size to the other populations was alsoanalyzed, and significant association was observed between both 674(D4S95) and4.2(D4S228) and HD, with P values of 0.035 and 0.050 respectively (Table 4-2).The control allele frequencies within each of these populations for each of the four probestested are shown in Table 4-2. The significance of the differences in allele frequencies wasdetermined by a X2 analysis. The numbers of chromosomes counted for each allele wereanalyzed between populations in a pairwise fashion, with all possible pairwisecombinations tested (data not shown). No significant differences (P < 0.05) wereobserved between any populations.To investigate underlying differences between HD populations of different ancestralbackgrounds, allele frequencies of these DNA markers on HI) chromosomes from France,Sweden, Denmark, UK. and the population of multiple different ancestries (Table 4-2)were compared. Alleles on HI) chromosomes were counted and analyzed betweenpopulations in a pairwise fashion, with all possible pairwise combinations tested.Significant differences were seen when allele frequencies for FID chromosomes werecompared between the Danish population and the multiple ancestry population at674(D4S95). A X2 analysis of the allele frequencies, with one degree of freedom, gave aX2 value of 5.08 with a P value of 0.024.774.2.2 DNA HAPLOTYPE ANALYSIS OF HD CHROMOSOMESThe differences in allele frequencies for 674(D4S95) on HI) chromosomes between Danishand UK families raised the possibility of multiple different origins for the HI) gene at leastin these two countries. To investigate this further, haplotypes for the FID chromosomeswere constructed with alleles at 674(D4S95), BS1(D4S133), E2(D4S228) and4.2(D4S228), from pedigrees in the Swedish, Danish, French, and UK populations aswell as in the combined group of multiple ancestries (Table 4-3, Figure 4-1).Despite the relative homogeneity of these populations, more than one HD haplotype wasobserved within each population (Table 4-3). One identical major haplotype (haplotype 1)was observed in all the populations tested. Haplotype 2 was the second most frequenthaplotype observed but was restricted to the Danish, Swedish and multiple ancestrypopulations. Haplotype 2 differed from haplotype 1 at only one marker (D4S95)suggesting the two haplotypes could be related. However, the subsequent localization ofthe HI) mutation demonstrated that this marker is the closest to the FR) mutation, and thushaplotype 2 more likely represents a distinct haplotype than a derivation of haplotype 1.Haplotype 3 is incomplete at 674(D4S95) due to the inability to determine phase in somefamilies because of limitations in pedigree structure. Resolution of the alleles at674(D4S95) in haplotype 3 would significantly increase the number of haplotypes 1 or 2observed (Figure 4-1).Two distinct H]) haplotypes (haplotypes 1 and 5) were observed in the French population(Table 4-3). Six pedigrees contained haplotype 1, and one pedigree contained anotherhaplotype that differed at two of the four markers tested (haplotype 5). Phase could not beestablished in six French pedigrees at 674 (D4S95).00NTable4-3.HaplotypesforHDchromosomesfromSwedish, Danish, French, UKandcombinedancestrypopulations.MULTIPLFHAPLOTYPII674BS1E24.2FRENCHDANISHSWEDISHU.K.ANCESTRYTOTAL(D4595)(D4S133)(D4S228)(D4S228)#%#%#%#%#%#%1AB1B646.2642.91043.5981.85866.78960.12BB1B00.0642.9521.700.01618.42718.23-B1B646.200.0521.7218.289.22114.24AA1B00.017.114.400.011.132.05A-3A17.617.100.000.011.132.06AA3A00.000.014.400.000.0‘10.77B-3A00.000.000.000.011.110.78AA1A00.000.000.000.011.110.79A-2A00.000.000.000.011.110.710-A2A00.000.014.400.000.010.7TOTAL13100.014100.023100.011100.087100.0148100.0791 IA lB 1 lB-B Ii lB 132 lB lB Ii lB4 IAIAI1 lB l IAIAI1IAI86 IAIAI3IAI lAl- 131A1571B1- 131A19[AlI2IAl I-1A121A110Figure 4-1. Schematic of haplotypes seen in French, Danish,Swedish and UK populations. Each box represents a marker used inhaplotype analysis; D4S95(674), D4S1 33(BS1), D4S228(E2) andD4S228(4.2). Haplotypes are numbered according to Table 4-3.Arrows between haplotypes suggest possible evolutionaryrelationships between haplotypes.80In the Danish population, there were four distinctly different haplotypes (Table 4-3).Haplotypes 1 and 2 were both observed in six pedigrees. Haplotype 5 was observed inone pedigree and haplotype 4 was observed once in the population tested.A minimum of five different haplotypes were observed in the Swedish population (Table 4-3). Haplotype 1 was observed on ten chromosomes while haplotypes 4, 6 and 10 occurredon only one chromosome each. Haplotype 2 was the second most frequent haplotype inthe Swedish population and was observed in four HD families. Five haplotypes wereincomplete at the 674(D4S95) locus due to pedigree structure.HD haplotypes were also constructed for the UK. population of similar size to the otherpopulations tested (Table 4-3). Only one major haplotype was observed (haplotype 1), aswell as two pedigrees with the incomplete haplotype 3.For comparison, HD haplotypes were also constructed for a larger population consisting of87 pedigrees of multiple origins (Table 4-3). One major (haplotype 1) and seven otherminor frequency haplotypes were observed when HD chromosomes of multiple originswere examined (Table 4-3). Haplotype 2 was the second most frequent haplotype(18.4%), and each of the six other minor haplotypes were represented once (1.1%).Finally, in the absence of double recombination or gene conversion events at more than onesite, these analyses have shown there is a minimum of two ND mutations in the Frenchpopulation, four in the Danish population and five in the Swedish population studied (Table4-3).814.2.3 DNA HAPLOTYPE ANALYSIS OF CONTROL CHROMOSOMESTo compare haplotypes of HD chromosomes to those of control chromosomes, DNAhaplotypes were also constructed for control chromosomes from the unaffectedchromosome from the patients and the unaffected spouses from the French, Danish,Swedish and UK. populations as well as the multiple ancestry group (Table 4-4). Of 451control chromosomes with complete haplotypes, 17 out of 32 possible haplotypes (53.1%)were observed. The observed distribution of haplotypes was significantly different fromthat expected as calculated from allele frequencies of the polymorphisms (X2 = 194.02, P =0.00000, with 23 df), due to significant linkage disequilibrium between these markers onnormal chromosomes.4.2.4 HAPLOTYPE COMPARISONSTwo of the rare HD haplotypes (haplotypes 8 and 9), were not observed in the controlchromosomes. Haplotype 8 is complete but haplotype 9 could be identical to haplotype 10which was rarely seen in HD (0.7%) and control (0.2%) populations. Thirteen controlhaplotypes (haplotypes 8 to 23), accounting for 9.3% of normal chromosomes were notobserved on HE) chromosomes (Tables 4-3 and 4-4).The number of haplotypes observed for the HI) group and the control group for eachcountry were analyzed by x2 analysis (data not shown). There were no significanthaplotype differences observed in the Swedish, Danish, French or UK populations. Incontrast however, when control haplotypes were compared to HI) haplotypes from the HI)patients of multiple ancestries, significant association of the major haplotype (haplotype 1)with HD was observed (x2 = 11.65, P = 0.00064 with df = 1). In addition, when all thepopulations were combined, and the total number of control chromosomes are compared tothe total number of HI) chromosomes, significant differences were observed (x2 = 41.6,P = 0.0071 with 22 df; 2 = 8.87, P = 0.0029 with 1 di).TABLE4-4.HaplotypesforcontrolchromosomesfromSwedish,Danish,French,U.K.andcombinedancestrypopulations.MULTIPLEHAPLOTYPE674BS1E24.2FRENCHDANISHSWEDISHU.K.ANCESTRYTOTALD4S95D4S133D4S228D4S228#%#%#%#%#%#%1856.34242.03848.7515.63030.02126.9123456789101112131415161718192021222300AB1BBB1B-B1BAA1BA-3AAA3AB-3AAA1AA-2A-A2ABA3AAB1ABA1BAB3AAA3BAA2AAB0BBB0BBA2AAB2BBB2BAB2ABB1A618.800.000.013.100.000.000.000.013.100.000.000.000.000.013.100.000.000.000.000.000.01010.000.010.088.000.000.000.011.033.022.022.000.000.011.000.000.000.000.000.000.000.0911.511.300.033.800.000.000.000.022.600.011.311.311.300.000.011.300.000.000.000.000.02347.91327.112.112.100.0612.500.000.000.000.012.112.100.012.112.100.000.000.000.000.000.000.000.011444.77127.8228.610.452.0166.372.700.000.000.020.820.800.010.452.010.810.810.810.420.810.410.410.423545.814027.3489.430.661.2346.671.400.000.010.291.751.030.630.671.420.420.420.410.220.410.210.210.2TOTAL32100.0100100.078100.048100.0255100.0513100.0834.3 DISCUSSIONIn this study, Danish, Swedish and French populations were examined for nonrandomallelic association with markers previously shown to be in nonrandom allelic associationwith the ND gene. These countries of origin were chosen on the basis that availablesamples were from autochthonous families. Thus, the controls used in this study were ofsimilar ancestry to the HD individuals. In contrast to results of previous studies’-5,nosignificant associations between any of the markers tested and H]) were observed (Table 4-2).What could account for these differences in reports of nonrandom allelic associationbetween certain markers and the ND mutation? The failure to detect nonrandom associationdoes not always imply its absence as association is dependent on allele frequencies, samplesize, and whether the disequilibrium is positive or negative (minor or major alleleassociated with the disease respectively)’0.Large sample sizes are often required to clearly demonstrate disequilibrium, particularly ifthe major allele at one locus is in disequilibrium with the disease locus. For this reason, insome instances it may be advantageous to combine all families regardless of ancestry to testmarkers initially for association, as the increase in sample size could outweigh thedisadvantage in using unmatched controls to determine control allele frequencies. This canresult in evidence for disequilibrium for loci that appeared to be in equilibrium within eachspecific population’°”. Alternatively, spurious association seen with small numbers maybecome less significant as the sample size increases. Nonrandom association was initiallydetected between 731(D4S98) and the ND gene’, that was not upheld in a four times largersample size3. However, with pooling of data from different populations disequilibriumstatistics may be significantly altered by unrecognized ethnic differences in allele84frequencies of different markers12. Therefore, one method to confirm previouslydemonstrated disequilibrium is to ensure that the controls are of similar ancestry as the HDindividuals by using a clearly homogeneous population for analysis. The selection of thecontrols is an important part of the analyses, and although ancestry was stringentlyassessed in this study, it may have still been biased by the fact that control chromosomeswere from individuals who had married into families with HI).In this study, the gene for HD segregated with the more common allele within eachpopulation, and likely, the sample sizes in this study were too small, lacking the power todetect nonrandom association. However, analysis of a UK population of comparable sizedid show significant association between both 674 (D4S95) and 4.2 (D4S228) and HI)(Table 4-2). This suggests that factors other than sample size alone were responsible forthe absence of nonrandom ailelic association in this study.The variability in results for non-random association between HI) and marker alleles indifferent reports has also been similarly observed in association studies with other diseases.For example, significant association between RFLPs within the insulin receptor gene andnon-insulin dependent diabetes mellitus have been reported in a Scandinavian population’3and a Mexican-American population’4”5but not in two Japanese populations’6”7.Thiscould be explained as spurious results due to a small sample size as one of the Japanesestudies not showing association was based on a sample size half as large as that of the priorstudies. Alternatively, the differences in association might be due to the populations beinggenetically different, with differences in allele frequencies for the markers tested betweenthe populations studied.There were no statistically significant differences in allele frequencies for controlchromosomes between the four populations reported in this study (Danish, French,85Swedish and UK) (Table 4-2), suggesting the observed lack of association in the non-UKpopulations was due to a factor(s) other than altered allele frequencies in controls.A possible explanation for the lack of observed association between markers and HI) in thisstudy is that the populations studied may not be homogeneous with respect to a singlefounder H]) chromosome. One method of observing differences between HI)chromosomes is to compare the allele frequencies for different markers on affectedchromosomes between populations. The difference in D4S95 allele frequencies betweenHD chromosomes in the Danish population and HI) chromosomes in the multiple ancestrypopulation, suggested that different chromosomes in the different populations underlie lID.The possibility of more than one origin for the mutation causing ND in the populationstesting, that would account for the lack of linkage disequilibrium observed, was furtherexplored by construction of DNA haplotypes (Table 4-3). Development of extended HDhaplotypes can lead to estimates of the minimum original mutations in a particulargeographical region or population. In this study, one major haplotype (haplotype 1) couldaccount for all patients with HI) in the UK population, but at least four other additional anddistinct haplotypes (haplotypes 2, 4, 5/6 and 10) were seen in the French, Danish andSwedish populations, suggesting at least four additional HI) mutations. The absence ofnonrandom association in these populations is therefore, at least in part, due to the presenceof a minimum of two different haplotypes within each population, compared to only onedefinite haplotype in the UK population studied. Therefore, it is the multiple HI)haplotypes within each population other than the UK that make it impossible to detect anyallelic association with lID.A scenario demonstrating possible evolutionary relationships between the haplotypes basedon similarity is shown in Figure 4-2. Haplotypes 1 and 2 although identical at distal86markers (D4S133 and D4S228) differ at the marker closest to the HI) mutation (D4S95)and likely represent distinct HI) haplotypes. The minor frequency haplotypes differ fromhaplotype 1 and 2 in at least two markers. Therefore, more than one recombination or arecombination event and gene conversion would have had to have occurred to makehaplotypes 5 to 10 similar to 1. This scenario seems unlikely given the decreased rate ofrecombination in this region18. Haplotype 4 seen in FID patients of Denmark and Swedendiffers from haplotype 1 at only one marker, and could be derived from haplotype 1 bygene conversion or mutation at the MboI site detected by BS1(D4S133). Alternatively,haplotype 4 could represent a third ancestral HD chromosome.MacDonald et al. using 8 DNA markers spanning D4S 126 to D4S98, studied 78 HI) and168 control chromosomes and reported that about one third of the HI) chromosomes werederived from one primordial haplotype in a mixed population of Western Europeandescent19. Using four markers, including D4S95 and three markers from a more distalregion previously demonstrated to be in association with HI), analysis of 148 HD and 513control chromosomes has shown that approximately 60% of HI) chromosomes from allancestries examined have an identical haplotype and might therefore have a commonancestral chromosome. The differences in numbers of predicted primordial haplotypes inall likelihood reflects the varying markers from different regions. However, what bothstudies clearly support are multiple origins for H]) chromosomes with expanded CAGrepeats.These results support the hypothesis that multiple occurrences of CAG expansion haveoccurred on different chromosomal backgrounds within these populations. Thus severaldifferent haplotypes with the tendency for CAG expansion led to multiple HI)chromosomes. The alternative hypothesis is that one ancient ancestral HI) mutation hasevolved dramatically in each population, resulting in distinctly different haplotypes due to87the length of time for mutation to occur. Recent evidence however, supports the firsthypothesis as haplotype analysis using markers close to the CAG repeat in thehomogeneous Swedish population demonstrates that at least three chromosomal haplotypesunderlies HI) in this homogeneous population20.The multiple HD haplotypes in each population may explain the two clusters of allelicassociation observed seen in Chapter 3. The possibility of multiple haplotypes with thesame allele at a distal marker by chance alone could account for the disequilibrium atmarkers 3 Mb from the HD gene.In summary, these findings show no significant association between HI) and the DNAmarkers tested in the French, Swedish, and Danish populations whereas significant resultswere demonstrated in a UK. population of similar size, suggesting that the absence ofassociation was not predominantly a consequence of allele frequencies or sample size. Tofurther investigate the number of potential HI) chromosomes, DNA haplotypes wereconstructed for the Danish, French, Swedish and UK populations. The minimum of twoHO haplotypes observed in each of the French, Danish and Swedish populations,compared to the one haplotype in the UK population of a similar size, is an important factoraccounting for the absence of association between HD and the DNA markers in thesepopulations. Furthermore, haplotype analysis of HI) chromosomes within each populationdemonstrated multiple haplotypes within each population, providing support for multipleindependent origins for the HI) chromosome in the French, Swedish and Danishpopulations.884.4 REFERENCES1. Theilmann J, Kanani S. Shiang R, Robbins C, Quarrell 0, Huggins M, Hedrick A,Weber B, Collins C, Wasmuth JJ, Buetow KH, Murray JC, Hayden MR (1989).Nonrandom association between alleles detected at D4S95 and D4S98 and theHuntington’s disease gene. 3 Med Genet 26:676-68 1.2. Snell RG, Larazou L, Youngman S, Quarrell OWJ, Wasmuth 33, Shaw DJ, Harper PS(1989). Linkage disequilibrium in Huntington’s disease: An improved localization for thegene. 3 Med Genet 42:673-675.3. Adam S, Theilmann 3, Buetow K, Hedrick A, Collins C, Weber B, Huggins M,Hayden M (1991). Linkage disequilibrium and modification of risk for Huntingtondisease. Am 3 Hum Genet 48:595-603.4. Barron L, Curtis A, Shrimpton AE, Holloway S, May H, Snell RG and Brock DIH(1991). Linkage disequilibrium and recombination make a telomeric site for theHuntington’s disease gene unlikely. J Med Genet 28:520-522.5. MacDonald M, Lin C, Srinidhi L, Bates G, Altherr M, Whaley WL, Lehrach H,Wasmuth 33, Gusella IF (1991). Complex patterns of linkage disequilibrium in theHuntington disease region. Am 3 Hum Genet 49:723-734.6. Novelletto A, Mandich P, Bellone E, Malaspina P, Vivona G, Ajmar F, Frontali M(1991). Non-random association between DNA markers and Huntington disease locus inthe Italian population. Am 3 Med Genet 40:374-376.7. Lander ES (1988). Mapping complex genetic traits in humans, in “Genome analysis: Apractical approach”. Davies KE, ed. IRL Press, Oxford.8. Jorde LB, Watkins WS, Viskochil D, O’Connell P, Ward K (1993). Linkagedisequilibrium in the Neurofibromatosis I (NFl) region: Implications for gene mapping.Am 3 Hum Genet 53:1038-1050.9. Hastbacka 3, de la Chapelle A, Kaitila I, Sistonen, Weaver A, and Lander E (1992).Linkage disequilibrium mapping in isolated founder populations: diastrophic dysplasia inFinland. Nature Genet 2:204-211.10. Thompson EA, Deeb S, Walker D, and Motuisky AG (1988). The detection of linkagedisequilibrium between closely linked markers: RFLPs at the Al-CIlI Apolipoproteingenes. Am J Hum Genet 42:113-124.11. Nei M and Li WH (1973). Linkage disequilibrium in subdivided populations.Genetics 75:213-219.12. Haviland MB (1991). Estimation of Hardy-Weinberg and pairwise disequilibrium inthe Apolipoprotein AI-CllI-AIV gene cluster. Am J Hum Genet 49:350-365.13. Sten-Linder M, Olsson M, Iselius L, Efendic S, Luthman H (1991). DNA haplotypeanalysis suggests linkage disequilibrium in the human insulin receptor gene. Hum Genet87:469-474.8914. McClain DA, Henry RR, Ulirich A, Olefsky JM (1988). Restriction fragment lengthpolymorphism in insulin receptor gene and insulin resistance in NIDDM. Diabetes37:1071-1075.15. Raboudi SM, Mitchell BD, Stern MP, Eifler CW, Haffner SM, Hazuda HP, FrazierML (1989). Type II diabetes mellitus and polymorphism of insulin receptor gene inMexican Americans. Diabetes 38:975-980.16. Takeda J, Seino Y, Yoshimasa Y, Fukumoto H, Koh G, Kuzuya H, Imura H, SeinoS (1986). Restriction length polymorphism (RFLP) of the human insulin receptor gene inJapanese: its possible usefulness as a genetic marker. Diabetes 29:667-669.17. Li SR, Oelbaum RS, Stocks 3, Gakon DJ (1988). DNA polymorphisms of the insulinreceptor gene in Japanese subjects with non-insulin-dependent diabetes mellitus. HumHered 38:273-276.18. Buetow KH, Shiang R, Yang P, Nakamura Y, Lathrop GM, White R, Wasmuth JJ,Wood S, Berdahi LD, Leysens NJ, Ritty TM, Wise ME, Murray JC (1991). A detailedmukipoint map of human chromosome 4 provides evidence for linkage heterogeneity andposition-specific recombination rates. Am J Hum Genet 48:911-925.19. MacDonald ME, Novelletto A, Lin C, Tagle D, Barnes G, Bates G, Taylor S, AllittoB, Akherr M, Myers R, Lehrach H, Collins FS, Wasmuth JJ, Frontali M, Gusella iF(1992). The Huntington’s disease candidate region exhibits many different haplotypes.Nature Genet 1:99-103.20. Almqvist E, Andrew 5, Theilmann 3, Goldberg P. Zeisler J, Drugge U, Grandell U,Tapper-Persson M, Winbiad B, Hayden M, Anvret M. Geographical distribution ofhaplotypes in Swedish families with Huntington disease (Human Genet, In press).90CHAPTER 5NONRANDOM ALLELIC ASSOCIATION ANDHAPLOTYPE ANALYSIS USING MARKERSFLANKING THE CAG REPEAT915.1 INTRODUCTIONThe theoretical inverse relationship between disequilibrium and physical distance may notnecessarily exist1. Unequal recombination rates across the genome as well as factors suchas drift, mutation, admixture and gene conversion may all affect methods of measuringdisequilibrium’. For these reasons, the extent of the usefulness of linkage disequilibriumin gene mapping has been questioned2. Further understanding of patterns of allelicassociation in different regions of the genome will aid in the mapping of other diseasegenes.The use of linkage disequilibrium was an important tool in localization of the gene forcystic fibrosis (CF). Analysis across the candidate region revealed a gradient of increasingassociation as the CFTR mutation was approached3. The strongest values ofdisequilibrium were obtained with markers in a 200 kb region centromeric to the mostcommon CF mutation, A508, with the highest degree of allelic association detectedbetween RFLPs only 30 kb from the mutation site5. In contrast, during the search for theHD gene, linkage disequilibrium results suggested no definitive region in which toconcentrate the search. However, the large extent of the candidate region and the irregulardistribution of polymorphic markers throughout this genomic region may have beenresponsible for the difficulty in refining the candidate region. Several other features of4pi6.3 suggested that linkage disequilibrium in this region may be more complex thanelsewhere in the genome. For example, the overall level of recombination decreases fromD4S1O to the telomeric marker D4S90. In addition, a recombination hotspot at D4S1Omakes the recombination rate across the candidate region unpredictable5.With thelocalization of the HD mutation, it became possible to reassess linkage disequilibrium withmarkers flanking the HI) gene to determine if a similar pattern to that for CF exists for HI)or, if the pattern was more complex as analysis across the 2.5 Mb candidate region initiallysuggested6’7.92Analysis of linkage disequilibrium and haplotypes of other diseases caused by dynamicmutation such as Fragile X (FRAXA) and myotonic dystrophy (DM), has led tohypotheses as to the number of founder chromosomes, adding further insight into themechanism of expansion. Dynamic mutations tend to continue to expand, as demonstratedby further expansion of the CAG repeat associated with HD resulting in a repeat sizeassociated with juvenile onset. The observed anticipation, resulting in onset of diseasebefore reproductive age, results in the loss of that ED chromosome. Dynamic mutationsare therefore continuously being eliminated from the population, however, the frequency ofthe mutant alleles is somehow maintained. This can be accomplished by increasedreproductive fitness of those with slightly higher numbers of repeats or by the occurrenceof new mutations. However, the strong measures of linkage disequilibrium observedbetween dynamic mutations such as those associated with FRAX and DM and flankingmarkers are inconsistent with the concept of recurring new mutations.Recent haplotype analysis of DM and FRAX chromosomes has suggested a model for thedevelopment of dynamic mutations that is in keeping with the observation of allelicassociation demonstrating a founder chromosome effect. In DM, the mutation is incomplete disequilibrium with a nearby 2 allele insertion/deletion polymorphism in multiplepopulations, suggesting a single origin for the predisposing mutation8. A proposed multistep model suggests the basis of the linkage disequilibrium may have been a rare ancestralmutation on a chromosome with an insertion allele that generated an allele with 19-30repeats. Subsequently, a second mutational mechanism results in the further expansion ofthese alleles to the premutation size range (30-50 repeats), which is inherently unstable, andexpands in subsequent generations to the disease range (>50 repeats)8. Thus the DMmutation is always associated with the insertion polymorphism haplotype, yet multiplemutation events occurring on this reservoir of unstable chromosomes prone to expansion93generate fully expanded DM mutations, and maintain the frequency of disease alleles in thepopulation.In FRAXA, linkage disequilibrium with flanking markers was not as strong as in DM,which initially suggested multiple initial mutation events9’10. Recent analysis of one of theflanking microsatellite markers initially used for allelic association studies demonstrated apreviously unknown complex pattern. The polymorphism actually consists of 3 variableregions of DNA, which when analyzed separately, are in strong association with theFRAXA mutation, and haplotype data suggests the mechanism of expansion from areservoir of large alleles is very similar to that of DM11. MacPherson et al. demonstrate byhaplotype analysis that FRAXA is caused by more than one initial expansion event12.Thus, the FRAX mutation is also the result of a multistep mutational process, however,with likely multiple origins for the CGG expansion..Analysis of association between markers flanking the CAG and H]) will reveal if the modelproposed for FRAXA and DM is common to other dynamic mutation disorders such asHD.5.2 RESULTS5.2.1 DNA MARKERSSix polymorphic markers distributed across a region of approximately 250 kb (Figure 5-1)were typed in 100 unrelated HD pedigrees primarily of Western European origin (Table 5-1). The unaffected allele of HD patients and chromosomes from unaffected spouses wereused as control chromosomes. The ancestries of the spouses were assumed to be similar tothat of their affected partner.Figure5-1MapofmarkersflankingtheHuntingtondiseasemutationusedinanalysisoflinkagedisequilibriumandhaplotypes.AGlutamic(CCG)n(CAG)nD4S95acidGT70D4S127(674)IIE—4cen4teI—5’HDgene(IT15)50kbi95Table 5-1. Polymorphic markers used in analysisLocus Allele Reference(number in text)A MacDonald et al. (1993)Glutamic acid B 21GT7O A Rommens et al. (1993)B 13and unpublished dataCCG 7 Andrew et al. (1994)8 179101112D4S 127 1 Taylor et al. (1992)2 16345678D4S95 AccI A Wasmuth et al. (1988)B 1596The probe GT7O has been previously described13,however allele frequencies for the AccIpolymorphism it detects have not been published. The polymorphic CCG repeat adjacentto the CAG repeat used as one of the markers in this analysis is discussed further inChapter 11.5.2.2 STATISTICAL ANALYSISAll loci were analyzed for association with 1-ID using a X2 test of homogeneity and the Yuleassociation coefficient. With multiple comparisons, it is likely one value would besignificant at 0.05 by chance alone. Therefore, the Bonferroni procedure14was used toadjust for multiple comparisons, and the corrected required significance level is 0.01.5.2.3 GENE FREQUENCIES AND ALLELIC ASSOCIATIONThe allele frequencies for the 6 polymorphic markers in 100 HD pedigrees are given inTable 5-2a and Table 5-2b. Allele frequencies for D4S95, D4S 127, CCG and the CAGrepeat are similar to those previously published’5-21.Allele frequencies are provided forthe AccI polymorphism detected by a previously described probe GT7012Alleles at D4S 127 and D4S95, 50 and 120 kb respectively telomeric to the H]) mutation arein strong association with the disease chromosome. Of the markers located within the HDgene, the polymorphic CCG repeat directly adjacent to the HD mutation is in very strongallelic association with disease. The glutaniic acid deletion polymorphism approximately150kb proximal to the H]) mutation is also in strong disequilibrium with the mutation.However, in contrast, the intronic GT7O polymorphism located between the glutamic acidpolymorphism and the HD mutation, is not in linkage disequilibrium with H]).Table5-2A.AllelefrequenciesforRFLPsonHDandControlchromosomes.LocusAllelelIDChromosomesControlChromosomesX2pDegreesofYule’sNo.%No.%freedomcoefficentA5668.299589.62GlutamicacidB2631.711110.3811.990.0005310.6Total82100.00106100.00GT7OA6977.5319984.68B2022.473615.321.840.1710.23Total89100.00235100.00CCG78895.6511560.21900.00115.761044.356433.5138.60.0000030.871100.0010.5236.730.000001Total92100.00191100.00D4S127122.3831.5924452.384624.3431517.862915.3425.980.0000340.55478.333116.4019.440.00001151619.058042.33Total84100.00189100.00D4S95A7585.2314960.32AccIB1314.779839.6817.060.0000410.58Total88100.00247100.0098Table 5-2b. Allele frequencies of CAG sizesLocus CAG Length HD Chromosomes Control Chromosomes(repeats) No. % No. %CAG 10-15 0 0.00 22 10.6816-20 0 0.00 147 71.3621-25 0 0.00 31 15.0526-30 0 0.00 6 2.913 1-35 0 0.00 0 0.0036-40 17 17.00 0 0.0041-45 50 50.00 0 0.0046-50 21 21.00 0 0.005 1-55 8 8.00 0 0.0056-60 2 2.00 0 0.0061-65 2 2.00 0 0.00Total 100 100.00 206 100.00995.2.4 HAPLOTYPE ANALYSISComplete haplotypes were constructed from HD pedigrees, counted and frequencies werecompared between normal and affected populations (Table 5-3a). The method of groupinghaplotypes is critical to the conclusions reached. Haplotypes are arranged in Table 5-3aaccording to their degree of disequilibrium with HD. Core haplotypes, consisting ofintragenic markers that demonstrate linkage disequilibrium with HD (glutamic acidpolymorphism and CCG repeat), and their frequency in the population tested are numberedand in bold. Distinct haplotypes within each core haplotype and their frequency in thepopulation tested are listed below the core haplotype.Twenty-five distinct HD haplotypes were observed after analysis of 67 HD chromosomes(Table 5-3a). Haplotype 1 is the most common haplotype in affected individuals andcontrols (55.22% and 46.43% respectively of all chromosomes) with no significantdifferences in haplotype frequency between the two populations. Haplotype ic and 11,major haplotypes within this core haplotype, each account for 7.46% of all HDchromosomes analyzed (Table 5-3a).Haplotype 2 is the second most frequent affected haplotype, observed on 35.82% of theHD chromosomes, and only on 10.7 1% of control chromosomes (Table 5-3a). Thisdifference in haplotype frequency between the two populations is significant (p =0.00029). One subhaplotype (haplotype 2d) alone accounts for 22.39% of all HDchromosomes and is the single most frequent HD haplotype observed.One haplotype represents a common haplotype in the control population, but is underrepresented in the HD population Haplotype 3 was seen significantly more frequently inthe normal population than on affected chromosomes (p = 0.00003 ) (Table 5-3a). Two100Table 5-3 a. Haplotypes of HD and Control chromosomes.HAPLOTYPE G GT7O CCG 127 674A HD Chromosomes Control Chromosome X2 PNO. % NO. Value1 A - 7 - - 37 55.2 39 46.4 0.83 0.36la A A 7 0 A 0 0.0 2 2.4lb A A 7 1 A 1 1.5 2 2.4Ic A A 7 2 A 5 7.5 5 6.0id A B 7 2 A 3 4.5 2 2.4le A A 7 3 A 4 6.0 2 2.4if A B 7 3 A 4 6.0 6 7.1ig A A 7 4 A 0 0.0 1 1.2lh A B 7 4 A 4 6.0 2 2.4ii A A 7 5 A 4 6.0 3 3.6lj A B 7 0 A 3 4.5 1 1.21k A A 7 2 B 2 3.0 2 2.411 A A 7 5 B 5 7.5 7 8.3im A B 7 3 B 1 1.5 1 1.2in A B 7 4 B 0 0.0 1 1.2lo A A 7 0 B 1 1.5 2 2.42 B - 7 - - 24 35.8 9 10.7 13.1 0.000292a B A 7 0 A 1 1.5 0 0.02b B B 7 0 A 1 1.5 0 0.02c B A 7 1 A 1 1.5 0 0.02d B A 7 2 A 15 22.4 5 6.02e B A 7 3 A 1 1.5 1 1.22f B A 7 4 A 1 1.5 0 0.02g B A 7 5 A 2 3.0 0 0.02h B A 7 2 B 2 3.0 1 1.22i B A 7 5 B 0 0.0 2 2.43 A - 10 - . 4 6.0 30 35.7 17.2 0.000033a A A 10 0 A 1 1.5 2 2.43b A A 10 2 A 1 1.5 1 1.23c A A 10 4 A 0 0.0 1 1.23d A A 10 5 A 1 1.5 9 10.73e A A 10 0 B 0 0.0 1 1.23f A A 10 2 B 0 0.0 1 1.23g A A 10 4 B 0 0.0 2 2.43h A A 10 5 B 1 1.5 13 15.54 B A 10 5 B 0 0.0 1 1.2 ]5 0.915 A A 9 4 A 0 0.0 1 1.2 0.01 0.916 A A 0 0 A 2 3.0 3 3.6 0.07 0.797 A A 0 0 B 0 0.0 1 1.2 0.01 0.91TOTAL 67 100.0 84 100.0101subhaplotypes (haplotypes 3d and 3h) represent 10.71% and 15.48% of all controlchromosomes respectively, yet less than 3% of all HD chromosomes.Four haplotypes (haplotypes 4 - 7) are minor haplotypes, representing less than 3% of allHD chromosomes, and 7% of control chromosomes.Analysis of core haplotypes based on the intragenic polymorphisms demonstrating linkagedisequilibrium with HD, i.e. alleles at the glutamic acid polymorphism and the CCG repeat(Table 5-3a) suggests that two major haplotypes, haplotypes 1 and 2 underlie HD, andtherefore, there is a minimum of 2 haplotypes underlying HD. Conversely, one majorhaplotype, haplotype 3, is seen on over 35% of normal chromosomes, but rarely on NDchromosomes (p = 0.00003).5.2.5 COMPARISON OF MEAN CAG LENGTH BETWEEN HAPLOTYPESAnalysis of the CAG repeat on 600 control chromosomes (Chapter 9) demonstrated that themean CAG size is 18 repeats, with a range of 10 to 39 repeats. The distribution washowever bimodal, with two peaks at 17 and 19 repeats. In this analysis, the mean CAGlength was calculated on control chromosomes for the 3 major haplotypes seen in Table 5-3b. Of interest, the mean CAG was higher for haplotypes 1 and 2 (18.8 repeats and 21.1repeats respectively) which represent 61% of control chromosomes, yet 94% of affectedchromosomes. In contrast the mean CAG for haplotype 3 which is underrepresented onaffected chromosomes compared to controls, was lower (16.9 repeats).5.2.5 ANALYSIS OF NEW MUTATION HAPLOTYPESHaplotypes were constructed for 7 RD chromosomes from families with new mutations forRD (Table 5-4). These chromosomes had a CAG repeat within the intermediate range inthe unaffected parent that expanded in the proband to a repeat length within the RD range.!Table5-3b.HaplotypesofHDandcontrolchromosomeswithmeanCAGlength.MeanCAGlengthofMeanCAGlengthofHAPLOTYPEGCCGHDChromosomesControlChromosomesHDchromosomescontrolchromsomesNO.%NO.%(±standarderror)(±standarderror)IA73756.923950.0044.7(±.9)18.8 (±.5)*2B72436.92911.5443.9(±.8)21.1(±l.4)*3A1046.153038.4646.3(±2.2)16.9 (±.4)TOTAL65100.0078100.00*meanCAGlengthofhaplotypes1and2aresignificantlydifferentfromthatof haplotype3(p=0.00043andp=0.0i6respectively)CTable5-4.HaplotypesofHDchromosomesfromnewmutationfamiliesHAPLOTYPEGGT7OCCG127674AHDChromosomesMeanCAGlengthofIAMeanlengthofMeanCAGlengthofNO.chromosomeofparentHDchromosomenormalchromosomes1A-7--734.6(±1.2)44.7(±0.9)19.0(±0.5)laAA72A234.548.516.6lbAB72A1304321icAA73A234.04619idAB73A1395220.6leAB74A1364221C104Five different haplotypes were seen, although all haplotypes had identical alleles at theintragenic glutamic acid polymorphism and the CCG repeat (Table 5-4). The intermediatesize of the CAG repeat in the unaffected parent, the expanded size in the proband, and themean CAG size of that particular haplotype in the normal population are given in Table 5-4.All HD haplotypes observed are derived from haplotypes seen in the normal population(Table 5-.3a). It is notable that four of the haplotypes of the new mutations have larger thanaverage CAG repeat sizes in the normal population. The core haplotype common to all thechromosomes (the glutamic acid polymorphism and the CCG repeat) has a mean CAG sizeon control chromosomes of 19 repeats, higher than that of the general population average.Although all new mutation haplotypes share identical alleles at the glutamic acidpolymorphism and the CCG repeat, there are various extended haplotypes that are prone toexpansion from an intermediate CAG size to that of HD.5.3 DISCUSSIONRepetitive sequence polymorphisms such as CA repeats or VNTRs have been thought tohave higher mutation rates22 than simple polymorphisms such as the A/B presence orabsence of a restriction enzyme site. Therefore, some reports have suggested the power todetect linkage disequilibrium over time is best with simple A/B polymorphisms23’4.However, others argue against this concept, showing strong linkage disequilibrium withCA repeats25. The allelic associations seen between HD and both D4S 127, a multi-allelicCA repeat, and D4S95, a single site polymorphism (Table 5-la), demonstrate that on4.p16.3 both types of polymorphism close to the HD mutation can demonstrate a significantmeasure of allelic association. The strong allelic association seen with D4S 127 isconsistent with its proximity to the CAG repeat. The marker not showing association is asingle site polymorphism suggesting that the lack of association seen is not due to thenature of the polymorphism.105In view of the association with markers 150kb telomeric to the CAG mutation, and 120 kbcentromeric to the CAG repeat, the lack of linkage disequilibrium between HD and GT7Olocated approximately 50 kb centromeric to the CAG repeat was somewhat surprising.This pattern of association is in contrast to that of the CF gene, where markers flanking thepredominant z508 mutation demonstrate decreased association with increasing distancefrom the mutation4. However, there is a precedence for more irregular patterns ofassociation across a genomic region. For example, in the NFl gene, 6 markers spanning280kb are in complete linkage disequilibrium with one another, however disequilibriumdecreases substantially with a marker only 68 kb distal to the other set. Similarly, in theinsulin receptor gene, a polymorphism in the last exon of the gene only shows associationto the adjacent polymorphism and not to any other upstream polymorphisms tested21.There are several possible explanations for the lack of allelic association seen between HDand the close marker, GT7O, in this analysis. Firstly, a high mutation rate at this particularlocus would quickly dissolve allelic association originally present on affectedchromosomes. It is likely that the AccI site polymorphism detected by the exonic probeGT7O is intronic, and therefore not under the same evolutionary constraints as an exonicpolymorphism. Thus, the allelic association between HD and the GT7O allele of thefounding chromosomes is no longer detectable due to subsequent frequent mutation eventsat that locus. Alternatively, and most difficult to assess is the role selection, drift,admixture, gene conversion and mutation have on determination of association. Theseeffects cannot necessarily be overcome by analyzing a larger sample set26. Replication ofthese results in different populations with different evolutionary histories is one method toconfirm the significance of these findings.106Initial reports suggested one origin for the HD mutation27’8. Genealogical analysessuggested that north western Europe, either France, Germany or Holland was the source ofthe original HD mutation, as affected individuals in countries such as Canada, SouthAfrica, Venezuela and Australia, all traced their roots back to these European countries28.Haplotype analysis prior to the identification of the HD gene demonstrated a core haplotypeunderlying two-thirds of HD chromosomes and this was presented as evidence for morethan one, but a limited number of initial founder HD chromosome29. However,subsequent haplotype analysis of HD chromosomes from several homogeneouspopulations demonstrated multiple HD haplotypes within each homogeneous population,and it was suggested that multiple mutation events underlie the disorder30’1.In this cohort of HD individuals, analysis of HD haplotypes clearly demonstrates that thereis more than one founder chromosome underlying HD (Table 5-3a). Haplotypes 1 and 2represent 90% of all HD chromosomes, and 57% of control chromosomes. The allelicdifferences between the most frequently affected core haplotype (haplotype 1) and the nextmost frequent haplotype (haplotype 2) suggest that there are at least 2 differentchromosomal backgrounds that expansion for the CAG repeat could have occurred on.This is consistent with previous analyses that there are multiple origins for HDchromosomes6.The strong measures of linkage disequilibrium between HD and alleles atthe glutamic acid site, the CCG repeat, D4S 127 and D4S95AccI, however, do suggest thatthe number of different ancestral mutations underlying ND is still small enough that linkagedisequilibrium can be detected. Development of other intragenic polymorphisms willenable further resolution of HD founder chromosomes.The difference in frequency between haplotype 2 in the control and affected populationssuggests that this haplotype was prone to expansion of the CAG repeat, leading to HD. It107is possible that the difficulty in matching control chromosomes accounts for the observedsignificance. However, the higher mean CAG size of control individuals with thishaplotype suggests that, similar to FRAXA and DM, it is the reservoir of chromosomeswith higher than the average repeat sizes which are prone to further expansion, eventuallyleading to the disease state. Therefore these chromosomes represent a frequent haplotypein the HD population. This is consistent with the multi-step model of a subgroup ofunstable normal alleles with a particular haplotype and high range CAG size being liable toexpand, resulting in a chromosome with a CAG repeat expanded to the HD range.Similarly, haplotype 1 is more common on RD than control individuals, and although thisdifference is not significant, the significant length of the CAG repeats on normalchromosomes with this haplotype suggests that it is the length of the CAG repeat which is afactor in the instability leading to disease.Haplotype 3 represent s 35.71% of control chromosomes, yet is rarely seen on RDchromosomes (Table 5-3a). It is of interest to note that haplotypes with this core have a 10allele at the CCG repeat, suggesting that these haplotypes are somehow more stable thanthose with 7 CCG repeats, and do not expand to become HD chromosomes. The averageCAG length of haplotype 3 is 16.9 repeats, and this relatively small CAG size may be afactor in the stability of these chromosomes.Analysis of chromosomes from families with new mutations for HD provides theopportunity to observe the rise of an HD chromosome from an intermediate size CAGrepeat. The 7 HD alleles derived from intermediate alleles in this analysis are associatedwith 5 different haplotypes, however, all have the same core haplotype (polymorphism andCCG repeat). Therefore, there are multiple chromosomal backgrounds, perhapsderivations of one ancestral chromosome, that are prone to expansion resulting in HD108chromosomes. The CAG sizes of four of these haplotypes on normal chromosomes arehigher than average (19 to 21 repeats), further supporting the hypothesis that CAG lengthis a factor in the instability of the CAG repeat, leading to expansion and HI).The results of this analysis suggest that the mechanism of repeat expansion is a commonone, occurring on several different chromosomal backgrounds. CAG length is one factor,although likely not the only one, contributing to the instability of the repeat. The datapresented here suggest that chromosomes with CAG repeats within the high end of thenormal range can undergo small expansions to the range of intermediate alleles (lAs) whichare then prone to further expansion into the range associated with disease.The identification of additional polymorphic markers within the HD gene would aid infurther refinement of the patterns of association within the HD gene. Furthermore, thedevelopment of additional markers more closely flanking the CAG repeat associated withHI) will also help in determining more accurately the number, and basis of ancestralfounder HD chromosomes.1095-4 REFERENCES1. Hill WG and Robertson A (1968). Linkage disequilibrium in finite populations. Theor.Appl. Genet 38:226-23 1.2. Jorde LB, Watkins WS, Viskochil D, O’Connell P, Ward K (1993). Linkagedisequilibrium in the Neurofibromatosis I (NFl) region: Implications for gene mapping.Am J Hum Genet 53: 1038-1050.3. Kerem B-S, Rommens JM, Buchanan JA, Markiewicz D, Cox TK, Chakravarti A,Buchwald M, Tsui L-C (1989). Identification of the cystic fibrosis gene: Genetic analysis.Science 245:1073-1080.4. Buetow KH, Shiang R, Yang P, Nakamura Y, Lathrop GM, White R, Wasmuth JJ,Wood 5, Berdahl LD, Leysens NJ, Ritty TM, Wise M, Murray JC (1991). A detailedmultipoint map of human chromosome 4 provides evidence for linkage heterogeneity andposition-specific recombination rates. Am 3 Hum Genet 48:911-925.5. Allitto BA, MacDonald ME, Bucan M, Richards J, Romano D, Whaley WL, Falcone B,Ianazzi 3, Wexler NS, Wasmuth JJ, Collins FS, Lehrach H, Haines JL, Gusella JF(1991). Increased recombination adjacent to the Huntington disease-linked D4S1O marker.Genomics 9:104-112.6. Andrew SE, Theilmann J, Hedrick A, Mah D, Weber B, Hayden MR (1992).Nonrandom association and two loci separated by about 3 Mb on 4.pl6.3. Genomics13:301-311.7. MacDonald ME, Lin C, Srinidhi L, Bates G, Altherr M, Whaley WL, Lehrach H,Wasmuth J, Gusella JF (1991). Complex patterns of linkage disequilibrium in theHuntington disease region. Am J Hum Genet 49:723-734.8. Imbert G, Kretz C, Johnson K, Mandel JL (1993). Origin of the expansion mutation inmyotonic dystrophy. Nature Genet 4:72-76.9. Richards RI Holman K, Friend K, Kremer E, Hillen D, Staples A, Brown WT,Goonewardena P, Tarleton J, Schwartz C, Sutherland GR (1992). Evidence of founderchromosomes in fragile X syndrome. Nature Genet 1:257-260.10. Oudet C. Mornet E, Serre JL, Thomas F, Lentes-Zengerling 5, Kretz C, Deluchat C,Tejada I, Boue J, Boue A, Mandel JL (1993). Linkage disequilibrium between the fragileX mutation and two closely linked CA repeats suggests that fragile X chromosomes arederived from a small number of founder chromosomes. Am 3 Hum Genet 52:297-304.11. Zhong N, Dobkin C, Brown WT (1993). A complex mutable polymorphism locatedwithin the fragile X gene. Nature Genet 5:248-253.12. MacPherson JN, Builman H, Youings SA, Jacobs PA (1994). Insert size and flankinghaplotype in fragile X and normal populations: possible multiple origins for the fragile Xmutation. Hum Mol Genet 3:399-405.11013. Rommens JM, Lin B, Hutchinson GB, Andrew SE, Goldberg YP, Glaves ML,Graham R, Lai V, McArthur I, Nasir J, Theilmann J, McDonald H, Kalchman M, ClarkeLA, Shappert, Hayden MR (1993). A transcription map of the region containing theHuntington disease gene. Hum Mol Genet 2:901-907.14. Weir BS (1990). Genetic data analysis. Sinauer, Sunderland, MA.15. Wasmuth JJ, Hewitt J, Smith B, Allard D, Haines J,L, Skarecky D, Parlow E,Hayden MR (1988). A highly polymorphic locus very tightly linked to the Huntingtondisease gene. Nature 322:734-736.16. Taylor SAM, Barnes GT, MacDonald ME, Gusella IF (1992). A dinucleotide repeatpolymorphism at the D4S 127 locus. Hum Mol Gen 1:142.17. Andrew SE, Goldberg YP, Theilmann J, Zeisler J, Hayden MR (1994). A CCGrepeat polymorphism adjacent to the CAG repeat in the Huntington disease gene:Implications for diagnostic accuracy and predictive testing. Hum Mol Genet (in press).18. Andrew SE, Goldberg YP, Kremer B, telenius H, Theilmann J, Adam 5, Starr E,Squitieri F, Lin B, Kalchman MA, Graham RK, Hayden MR (1993). The relationshipbetween trinucleotide (CAG) repeat length and clinical features of Huntington disease.Nature Genet 4:398-403.19. Duyao M, Ambrose C, Myers R, Noveletto A, Persichetti F, Frontali M, Doistein S,Ross C, Franz M, Abbott M, Gray J, Conneally P, Young A, Penney I, Hollingsworth Z,Shoulson I, Lazzarini A, Falek A, Koroshetz W, Sax D, Bird E, Vonsattel J, Bonilla E,Alvir I, Bickman Conde 3, Cha 3-H, Dure L, Gomez F, Ramos M, Sanchez-Ramos J,Snodgrass S. deYoung M, Wexler N, Moscowitz C, Penchaszadeh 0, MacFarlane H,Anderson M, Jenkins B, Srinidhi 3, Barnes G, Gusella J, MacDonald M (1993).Trinucleotide repeat length instability and age of onset in Hntington’s disease. NatureGenet 4:387-392.20. Snell R, MacMillan JC, Cheadle JP, Fenton I, Lazarou LP, Davies P, MacDonald ME,Gusella IF, Harper PS, Shaw DJ (1993). Relationship between trinucleotide repeatexpansion and phenotypic variation in Huntington disease. Nature Genet 4:393-397.21. MacDonald ME, Duyao M, Ambrose C, Barnes G, Srinidhi J, Myers R, Gusella J(1993). A codon deletion in the Huntington’s disease gene is associated with the major NDchromosome haplotype. Am J Hum Genet 53:A80.22. Jeffreys Al, Royle NJ, Wilson V, Wong Z (1988). Spontaneous mutation rates tonew length alleles at tandem-repetitive hypervariable loci in human DNA. Nature 332:278-281.23. Elbein SC (1992). Linkage disequilibrium among RFLPs at the insulin-receptor locusdespite intervening Alu repeat sequences. Am J Hum Genet 51:1103-1110.24. Hastbacka 3, Chappelle A, Kaitila I, Sistonen P, Weaver A, Lander E (1992). Linkagedisequilibrium mapping in isolated founder populations: diastrophic dysplasia in Finland.Nature Genet 2:204-211.11125. Pandolfo M, Sirugo G, Antonelli A, Weitnauer L, Ferretti L, Leone M, Dones I,Cerino A, Fujita R, Nanauer A, Mandel IL, Di Donato S (1990). Friedrich ataxia inItalian families: Genetic homogeneity and linkage disequilibrium with the marker loci D9S5and D9S15. Am J Hum Genet 47:228-235.26. Kaplan N and Weir BS (1992). Expected behaviour of conditional linkagedisequilibrium. Am 3 Hum Genet 51:333-343.27. Conneally PM, Haines JL, Tanzi RE, Wexier NS, Penchaszadeh GA, Harper PS,Foistein SE, Cassiman 33, Myers RH, Young AB, Hayden MR. Falek A, Tolosa ES,Crespi 5, Di Maio L, Holmgren G, Anvret M, Kanazawa I, Gusella (1989). Huntingtondisease: No evidence for locus heterogeneity. Genomics 5:304-308.28. Hayden MR (1981). Huntington’s disease. Springer Verlag, New York.29. MacDonald ME, Lin C, Srinidhi L, Bates G, Altherr M, Whaley WL, Lehrach H,Wasmuth 3, Gusella IF (1991). Complex patterns of linkage disequilibrium in theHuntington disease region. Am 3 Hum Genet 49:723-734.30. Andrew SE, Theilmann 3, Almqvist E, Norremolle A, Lucotte G, Anvret M, SorensonSA, Turpin JC, Hayden MR (1993). DNA analysis of distinct populations suggestsmultiple origins for the mutation causing Huntington disease. Clin Genet 43:286-294.31. Almqvist E, Andrew SE, Theilmann 3, Goldberg P, Zeisler 3, Drugge U, Grandell U,Tapper-Persson M, Winbiad B, Hayden MR, Anvret M. Geographical distribution ofhaplotypes in Swedish families with Huntington disease (In press, Human Genetics).112CHAPTER 6IDENTIFICATION OF AN ALURETROTRANPOSITION EVENTThe work presented in this chapter has contributed to three publications.Goldberg YP, Rommens JM, Andrew SE, Hutchinson GB, Lin B, Theilmann J, GrahamR, Glaves M, Starr E, McDonald H, Nasir J, Schappert K, Kalchman M, Clarke LA,Hayden MR (1993). Identification of an Alu retrotransposition event in close proximity toa strong candidate gene for Huntington’s disease. Nature 362:370-373.Hutchinson GB, Andrew SE, McDonald H, Goldberg YP, Graham R, Rommens JM,Hayden MR (1993). An Alu element insertion in two families with HD defines a newactive Alu subfamily. Nuci Acids Res 21:3379-3383.Rommens JM, Lin B, Hutchinson GB, Andrew SE, Goldberg YP, Glaves ML, GrahamR, Lai V. McArthur J, Nasir J, Theilmann J, McDonald H, Kalchman M, Clarke LA,Schappert K, Hayden MR (1993). A transcription map of the region containing theHuntington disease gene. Hum Mol Genet 2:901-907.1136.1 INTRODUCTIONAfter mapping a disease locus to a chromosomal location, the region of the genomerequiring further study to identify the genes present may still be relatively large (1-2 Mb ormore). Systematic and reliable identification of coding regions within extensive genomicregions is difficult as genes are irregularly dispersed and may contain many exons.Selective expression of a particular gene with respect to type of tissue or stage ofdevelopment may also complicate retrieval of cDNA.Previous methods of searching for genes through the identification of CpG rich sequencesand demonstration of phylogenetic conservation over the region of interest is labourintensive and requires subsequent experimentation to obtain cDNAs for sequence analysis.The more recent approaches to identify genes, including exon trapping”2 and cDNAselection procedures3’4are more sensitive and greatly expedite cloning strategies.6.1.1 GENE TRACKINGTo identify transcribed sequences within the proximal candidate region, a collaborationwith Dr. J. Rommens was established. We identified and cloned transcribed segmentscontained within a 1 Mb region localized around D4S95, using a modified direct eDNAselection scheme termed Gene Tracking (Figure 61)5,6. The Gene Tracking methodinvolves preparation of cDNA from an appropriate tissue using primers that subsequentlycan be used in polymerase chain reaction (PCR) to amplify and clone specific cDNAs. ThecDNA pool is blocked for repetitive sequences and hybridized exhaustively withimmobilized and purified YAC or cosmid DNA from the candidate region. After tworounds of prolonged hybridization with cDNAs, eluted cDNAs were then amplified byPCR and cloned to yield a library of selected cDNAs for each YAC. The clones of theselibraries were arrayed and screened for presence of repetitive sequence, and the remainingclones were then individually hybridized to EcoRI digestions of human, and human-Figure6-1.GeneTracking4pI’llI“‘IIIIImmobilizedYACSfromRegion1hybridizedwithbrainandtissuemixcDNA’s•Elutionandpreparationofminilibraries•SelectionofclonesChrom4SomaticCellHybridTranscriptionUnitAssemblyYACMapping/AssignmentToBINSNorthernBIot/cDNAscreening/sequencing—98VD4S1O18012795182183VVVVVVCANDIDATERegion1REGIONGeneTracking________1228227‘V•Region21000kbMapping*(+ve)SequenceAnalysisBLAST,SORFIND,PYTHIA,GRAIL115hamster hybrids that contain human chromosome 4 and YAC clone DNAs in order toconfirm their origin. The clones were then hybridized to each other to test uniqueness,hybridized to a series of additional overlapping YACs for physical mapping, hybridized toRNAs of tissues or cell lines and fmafly characterized by sequencing.In this collaboration, cDNA was produced from tissue samples of frontal cortex and withpools of cDNAs from four tissue sources including fetal brain, frontal cortex, bone marrowand liver. A high proportion (between 50% - 90%) of clones were found by hybridizationto originate from chromosome 4 and from the original chromosome 4 YAC. The cDNAclones were termed “GT clones” as an abbreviation for “gene tracked”. In this manner atotal of 53 GT clones was isolated. The structural integrity of the human DNA within theYAC was confirmed by the comparison of hybridization patterns observed for each clone toYAC DNA and to human and human-hamster hybrid DNAs. For each GT clone, thehybridizing EcoRI fragments of the YAC DNAs corresponded to those observed withhuman total genomic and chromosome 4 DNA.The series of overlapping YACs spanning the region were used to delineate physicalintervals into non-overlapping BINS containing a particular stretch of DNA across the 1Mb region as depicted in Figure 6-2. BIN 3 which contains D4S 127, D4S95, and D4S 182was subdivided into 3 compartments, BIN 3A, 3B, and 3C by using additional overlappingYACs. Refined positioning of each cDNA was deduced by the hybridization pattern to thisarray of YACs.The clones were categorized into transcription units by refined physical mapping and cross-hybridization to each other and to RNA from a variety of tissues including those initially1164p1000 kbD4S1O 125 180 127 95 182 183 43 98 111 227//I II“ Region 1 %% Region 2IIIIIII 95180 127 I 182 183V Vs V100 Kb353G6 2A11I II I___I I70D11 i iI I I I I1 1 121 3 4 I 5 IBINS________A187G12____________ __ ____B C I D1O2A1OBIN3 IFigure 6-2. Mapping of transciptional units within the proximalcandidate region. Overlapping regions within Yacs 353G6, 70D1 1 and2A1 1 were used to define 5 separate BINS. Yacs Al 87G1 2 andDl 02A1 0 were used to further refine BIN 3 into three separateregions: A,B,C. GT clones were mapped by hybridization to digestedYac DNA and assigned to BINS accordingly.117used to select the cDNA fragments. Direct sequence analysis of the clones revealed thatseveral contained open reading frames, however there were also clones for which openreading frames could not be detected. The combined information of RNA hybridizationand physical mapping clearly indicated that some of the GT clones were portions of thesame transcription units. Based on their expression pattern and size, a total of ninedifferent mRNAs were detected. The GT clones, their localization according to BIN,transcript size if observed, and sequence analysis of the clones are given in Table 6-1.6.1.2 GT CLONE ANALYSISThe large number of cDNA clones (GT clones) isolated made it necessary to rank them asto their potential for candidacy for further analysis. Refined physical mapping of theseclones, by hybridization to different YACs and to a cosmid-phage contig from this regionas well as long range mapping by pulsed field electrophoresis, identified a subset of cloneswhich mapped to the proximal candidate region between D4S 127 and D4S 182 (BIN 3)which contains D4S95 and also encompasses DNA markers which form the core haplotypethat is present on about one third of disease chromosomes7.A total of 20 cDNA clonesmapped to BiN 3 (Table 6-1).GT clones from BIN 3 were tested for coding potential on Northern blots to determinetranscript size and any difference in expression between control and affected. GT clonesshowing coding potential with multiple bands on Southern blot hybridization representingmultiple exons were also of interest. In addition, GT clones with excellent coding potentialaccording to computer analysis of sequence by GRAIL (Gene Recognition AnalysisInternet Link) were also treated as good candidates. For example, the transcription unitdetected by GT7O had excellent coding potential, detected several genomic fragments,hybridized to two distinct RNA species and also detected DNA polymorphisms, making it astrong candidate for the H]) gene.Table6-1.Summaryofcharacterizationof53retrievedcDNAfragmentsBiNGTCloneEcoRisizeRNAhybridizationSequenceanalysissize(bp)(kb)size(kb),distributionBIN1A1686507.05.5;W,Fi,C,B,CoDBsimilarity,HUMPHPLA2(phospholipaseA2)BINlB6760012.0absentDBsimilarity,MITGTRN6(yeastmitochondrion)7191212.0absentSameasGT67652072.9absentDBsearchneg.699763.8absentDBsearchneg.685739.5absentDBsearchneg.LiandMER12repeatspresent1675009.5absentDBsearchneg.,sequenceoverlapswithGT68866009.5DBsearchneg.,sequenceoverlapswithGT6816655012.04.5,similartoGT88Notsequenced886006.04.5,similartoGT166Notsequenced*1495846.0,5.011.0,13.0;K,Co,Fi,L,W,CDBsearchneg.CodingPotentialExcellentBIN2*707579.0,8.5,1.211.0,13.0;L,F,C,W,BDBsearchneg.CodingPotentialExcellent*636009.0,1.2SameasGT7O875368.5absentDBsearchneg.6664410.0absentDBsearchneg.165600,55011.5,4.2absentDBsearchneg,sequenceoverlapswithGT66547572.7DBsearchneg.,AluandMERrepeatspresent727642.8absentDBsearchneg.189695,57811.0,6.0absentDBsearchneg.,compositecloneBIN3B2355114.0absentDBsearchneg.MIRrepeatpresent4959214.0SameasGT239053214.0absentSameasGT239359514.0SameasGT23129589SameasGT23130597SameasGT2313650014.0,7.5absentDBsearchneg.4464613.7absentDBsearchneg.4855014.0absentDBsearchneg.Table6-1.Summaryofcharacterizationof53retrievedcDNAfragmentsBINGTCloneEcoRisizeRNAhybridizationSequenceanalysissize(bp)(kb)size(kb),distributionBIN3B455169.03.8;W,L,F,C,CoDBsearchneg.,2ORFs,Alurepeatpresent34b5001725605.0SequenceoverlapswithGT451575006.4absent3049013.0DBsearchneg.Alurepeatpresent13845813.0absentDBsimilarityHSIL1AG,Alurepeatpresent1696008.0,14.0DBsearchneg.Alurepeat12755016.0,14.0,7.5DBsearchneg.SequenceoverlapsGT16917055014.05355016.0DBsearchneg.LirepeatpresentBIN3C246676.512.0DBsearchneg.SimilaritytoLirepeatBIN4128480absentDBsearchneg.1314225.5DBsimilarityTFD:P0136,422bpORF164443,2505.5DBsearchneg.443bpORF,excellentcodingpotentialI544395.5DBsearchneg.439bpORF,excellentcodingpotential159400DBsearchneg.Codingpotentialgood43495absentDBsearchneg.133480absentDBsearchneg.BIN5B9845014.0,9.03.6,widedistribution12344714.0,9.0,4.11.8,3.6,Fi,L,W,C,CoDBsearchneg.ORFgoodcodingpotential12635214.0,9.0SequenceoverlapswithGT1231256627.5absentDBsearchneg.1375009.0absentDBsearchneg.1605004.2absentDBsearchneg.i79bpORF,goodcodingpotential1613499.03.8,Fi,L,W08matchHUMXTO1O95(EST)isidenticalTheclonesarelistedbyGTnumberaccordingtotheirBINassignments.TheclonesizeandtheEcoRigenomicfragmentsdetewiththeseclonesarealsolisted.SizesofmRNAsdetectedinthetissuesaregiveninkb:K=kidney,Co=Coscells,F=fibroblastsL=lymphoblast,W=HL6Ocells,C=Caco-2cells,B=bonemarrow,F=frontalcortex,FB=fetalbrain.Groupsofclonesthatareshownbracketedindicatethosethatpartiallyoverlapasdeterminedbycross-hybridizationorsequenceanalysis.Database(DB)searcheswerecarriedoutagainstnon-redundantnucleicacidandproteindatabasesofGeribankaswellasthedbESTandTranscriptionFactordatabases.120Many gene mutations are detected by genomic rearrangement, detectable by Southern blot.A strategy to screen a battery of 250 unrelated patients digested with two different enzymeswas undertaken in the laboratory by Jane Theilmann, beginning with the GT clones thatmaptoBlN3.6-2 RESULTS6.2.1 GT48 GENOMIC REARRANGEMENTOne GT clone, GT48, located 120 kb from D4S95, detected a genomic rearrangement in 2out of 250 patients on Southern blots digested with MspI. This altered band (1.7 kb MspIfragment) segregated with HD in both families (Figure 6-3 a, 6-3b).Interestingly, in one of these families (Figure 6-4a) a recombination event places the RDgene distal to D4S 125. This recombination event in an affected individual from a familywith clearly established diagnosis of HD reduced the candidate region by indicating thedefective gene must be distal to D4S 125, thus redefining the proximal boundary for thegene (Figure 6-4a).The rearrangement occurred on the same haplotype in both families (Figure 6-4a,6-4b) andthis haplotype was unique among 140 HD families, suggesting a common origin for thisrearrangement. Both families were of Scottish origin with their ancestors living 50 kmapart.6.2.2 GENOMIC CLONING OF ALU RETROTRANSPOSITIONIn order to map and localize the genomic rearrangement further, a 2 phage was isolated byDr. Rommens using GT48 as a probe (Figure 6-5a). Detailed restriction mapping of2GT48 in control DNA, determined from single and double digestions using Hindlil,TaqI, MboI, MspI, PstI, and XbaI and hybridizations with GT48, is shown in Figure 6--4SE.flL1Lj jut111LIISIaJ..uFigure6-3a.GenomicrearrangementinoneoftwofamilieswithHuntingtondisease.SouthernblotanalysisofMspldigestedgenomicDNAprobedwithGT48revealedanaltered1.7kbMsplfragmentaffectedindividualsfromthisfamily.2.0L6Figure6-3b.Genomicrearrangementco-segregatingwithHDinthesecondfamily.SouthernblotanalysisofMspldigestedgenomicDNAprobedwithGT48showsanaltered1.7kbMsplfragmentco-segregatingwithHDinallaffectedindividualsandindividualspredictedtobeathighriskforhavinginheritedtheHDgene.123IIIllEcoRl (G8) B B B B A B B A BHindlil (G8) C C C B B C A C A ABgII (G8) A B A B B B A A A APstI (YNZ32) 3 7 3 4 4 7 8 5 1 5XmnI (D4S180) B B B B B B A A B .PCR (D4S127) 3 5 3 5 5 5 2 5 2 5Acci (674) A A A A A A A B A ATaqI (674) A A A B B A A B B AMboI (674) B A B A A A A B A ASstI (731) B A B A A A B B B APstl (252) 3 3 3 3 3 3 - - 1 3MspI (678) B B B B B B A B A BPstI (157) 2 1 2 2 2 1 2 2 2 1PstI (Cl 3B) 2 7 2 2 2 7 2 2 6 6 7 7PCR (E24CA) 311 311 1111 1111 711Hindill (2R3) 2 2 2 2 2 2 2 2 1 2Figure 6-4a. Recombination within a HD family refining the proximalboundary of the HD candidate region. The affected haplotype in thisfamily is designated within the boxed region. Recombination betweenmarkers D4S1 80 and D4S1 27 in individual 11-6 shifts the proximalboundary at least 200kb towards the telomere, from D4S1 25 (YNZ32)to D4S1 80.BBAAAB54AB22AABAAABA124EcoRl (G8) A B A A A B A A A B A B B A B AHindlU (G8) A 0 A C A D A C AA A A A C A CBgIl (G8) A A A B A A A B AA A A A B A BPstI (YNZ32) 2 7 2 7 2 7 2 7 6 6 6 6 6 7 6 7XmnI (D4S180) - - - - - - - - - - - - - - -PCR (D4S127) - - - - - - - - - - - - - - -Acci (674) B A B A B A B A AA A B A A B ATaqI (674) A B A A A B A A B B B B B A B AMboI (674) A B A A A B A A AA A B A A B ASstI (731) B B B A B B B A BA B B A A B APstl (252) 12 13 12 13 11 11 13 13MspI (678) A A A B A A A B AA A A A B A BPstI (157) 1 1 1 1 1 1 1 1 22 2 3 2 1 3 1Figure 6-4b. Haplotype analysis of the second family demonstratingthe insertion detected by GT48. The affected haplotype in thisfamily is designated within the boxed region. Individual 11-2 hasinherited the HD chromosome but does not yet appear affected.125CEN TEL95182 183 4310 147100 Kb353G6 i i 12A1170D11’____________________ I______II II I I II II I I II 1 121 3 14 I 5 1BINSA187G12 D1O2A1OIA B ciBIN3 L‘ GTZ4GT44 GT48 GT49-_* — —GT481 KbFigure 6-5a. Mapping of GT clones. GT44, GT48, and GT49 mappedto both Al 87G1 2 and Dl 02A1 0 as well as 70D1 1, indicating theirposition in BIN 3C. All three clones are contained within a 1 5kblambda phage GT48 isolated using GT48 as a probe. GT48contains a Hindill polymorphism (*) detected by both GT44 andGT48.1265b. A polymorphic Hindifi site, detected by GT48 and an adjacent GT clone GT44, wasidentified as shown (Figure 6-5b, 6-5c).Analysis of restriction mapping patterns from affected individuals with the insertiondemonstrated the rearrangment was detected with GT48 but not with GT49, thus localizingthis rearrangement to the 1.2kb polymorphic Hindu fragment (Figure 6-5b).The genomic rearrangement was seen in genomic DNA digested with multiple enzymes andhybridized with GT48 (Hincli, Hindill, MboI, TaqI) (Figure 6-5d). Using GT48 as aprobe, the rearrangement was not observed on Southern blots digested with PstI, but wasdetected with MspI digested DNA. Thus, the site of rearrangement was further localizedwithin the 1.2kb Hindifi fragment to within the 200bp MspI-PstI fragment (Figure 6-5b).Fine restriction mapping of the genomic region around GT48 and the altered restrictionfragment sizes in the affected individuals strongly suggested the genomic rearrangementwas an insertion.The 1.2kb Hindifi fragment from ?GT48 was subcloned and sequenced. Primers flankingthe insertion site were designed and a PCR assay to detect the inserted element wasestablished (Figure 6-6a). PCR primers are as follows:primer A: 5’ATGTAKI9(Ifl’CAGGACATGTGGC3’primer B: 5’AAATAACATCCAGAATCflCAGAT3’PCR conditions were 3 mM MgCI2, 50 mM KC1, 20 mM Tris pH 8.4 , 200 mM of eachdNTP, 0.5 mM of each primer, 1 .25U of Taq DNA polymerase per 25 jil reaction.Thermal cycling conditions were 95 °C for 45 s, followed by 35 cycles of 94 °C for 1 mm,54 °C for 30s, 72 °C for 1 mm, with a final extension at 72 °C for 10 mm. PCR productswere resolved on 1% agarose gels, in 1X TBE, at 100V for 1 hour.Figure6-Sb.RestrictionmapofGT48,a15kbclonecontainingthesiteofAluinsertiondetectedbyGT48intwoHuntingtondiseasefamilies.LocationofGT48andadjacentGTclonesGT49andGT44areshown.TheAluinsertionsitewaslocalizedtoa1.2kbHindlllfragmentshownboxed.UsingGT48asaprobetherearrangementwasnotdetectedbyPstl,yetwasdetectedwithMspl,thusfurtherlocalizingtheinsertiontoMspl-Pstlfragmentwithinthe1.2kbHindlllfragment.P=Pstl,EsEc0RI,X=Xbal,M=Mspl,T=Taql,Mb=Mbol,H=Hindlll.ThepolymorphicHindillsiteismarked(*)GT44GT48GT49*NEHTHMPHMPTPHMEIIliiiIIIIIIIMbXMbTXTP11.2kbII1kbIGT44003.5-2.4--(011L()CJC)Figure6-Sc.HybridizationofGT44toYACDNAs.SouthernblotofDNAfromYACs353G6,7ODl1,Al87G12,Dl02A10and2A11digestedwithHindilllocalizesGT44toBIN3BanddetectstheHindillpolymorphism.0C•4Jcc01c’J0Figure6-5d.GenomicDNAdigestedwithmultiplerestrictionenzymesandprobedwithGT48resultedinalteredbandsofequalsizeintheaffectedindividualsfromeachfamily.(lane1familya,lane2familyb)andacontrol(laneC).HincII12CHindIII12CMboI12C41TaqI2C—7.0—4.3—3.5—2.0—1.3—1.1460-C118—123456789Figure6-6a.Primersflankingtheinsertionsiteweredesigned.Theseprimersgeneratea118bpfragmentinnormalindividuals(lanes6-9)andinadditiona460bpproductinfiveaffectedindividualsfrombothfamilies(lanes1-5).ThePCRproductwassubclonedandsequenced.131GT48GT44 GT48 GT49— * —AATTTCTTCTTGTTTAAGAGTATGCTGGCCGGGCGCGGTGGCTCACGCCTGTAATCCCGCACTTTGGGAGGCCGGGCGGGTGGATCATGGGTCAGGAACAAAAAATTACCGGGCGCGGTGGCGGGCGCCTGTAGTCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATGGCGTGAkCCCGGGAAGCGGAGCTTTCGTGAGCCGAGkTTGCGCCACTGCAGTCCGCAGTCCGGCCTGGGCGCAGGCAA.GA.CTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAGTATGCTGATTGATATTTGTTCATCATGGGFigure 6-Gb. The 1.2kb Hindlll fragment was subcloned, sequenced andprimers flanking the site of insertion were derived. PCR productsfrom affected individuals with the insertion were cloned andsequenced. The inserted sequence shown here represents a full-lengthAft, element (bold) and the insertion site is flanked by a 9 bp directrepeat (underlined).132These primers generated a 118 base pair fragment in normal individuals and a 460 base pairproduct in five affected individuals from both representative families. The 460 base pairproduct was subcloned using TA cloning (according to Invitrogen protocol) and sequenced(Sequenase).Cloning and sequence analysis of the rearrangement in both families demonstrated aninsertion element of 331 base pairs between the MspI and PstI sites of the 1.2kb Hindififragment. The insertion element is a member of the Alu family of mobile repetitiveelements (Figure 6-6b). The Ala was identical in both families and had inserted at theidentical nucleotide position in both families. These observations and the tracing of bothgenealogies to the same area of Scotland, suggested one retrotransposition event hadoccurred, seen in two branches of the same family.The core chromosomal haplotype seen in both families with the Alu insertion extending forabout 1 Mb including alleles at D4S95 and D4S98 is seen in 2% (14/687 chromosomes) ofcontrol DNA banked in Vancouver The PCR assay designed to detect this insertiondemonstrated that none of the 14 individuals from our cohort with this rare haplotype hadthis insertion. Furthermore, 30 affected individuals of Scottish descent did not have thisrearrangement. In addition, screening of 1,000 control chromosomes of multiple ancestrieswith GT48 showed no insertions had occurred.The Ala element is flanked by a perfect 9 base pair duplication of the target sequence at thesite of insertion (Figure 6-6b), characteristic of mobile element insertion byretrotransposition. Creation of staggered single stranded nicks on both strands of DNA,followed by repair synthesis results in flanking direct repeats during the insertion event8.Furthermore, the sequence surrounding the insertion is AT rich which is consistent with thehypothesis that Alu elements preferentially integrate into AT rich regions8.133If the Alu was inserted at the HD gene, alleles at polymorphisms in this region would beexpected to be in strong nonrandom allelic association with HD. The 1.2kb HindIllpolymorphism was investigated for such allelic association to HD. Allele frequenciesdiffered slightly between the HD and control populations, however this difference was notsignificant (Table 6-2).Sequence analysis of the 1.2kb HindIll fragment containing GT48 did not reveal asignificant coding potential. The question of whether the Alu was causative of HD in thesetwo families was addressed after identification of CAG repeat in a gene ff15 expanded inHD patients, and development of an assay for CAG analysis (Chapter 7). CAG analysis ofindividuals in these two families showed that all affected individuals manifesting the Aluinsertion also demonstrate CAG expansion in the range associated with HD (Figure 6-7).6.3 DISCUSSIONThe goal of positional cloning strategies is the identification of genes of biologicalimportance. The use of cDNA selection procedures to yield a transcription map for agenomic region provides significant advantages over conventional positional cloningapproaches as numerous candidate genes or gene fragments are directly and quicklyavailable for assessment.Gene Tracking, a cDNA selection scheme, was used to identify coding sequences from the1 Mb candidate region around D4S95, from D4S 180 to D4S 183. The advantage of GeneTracking over other methods of transcribed sequence identification is that it enablesdetection of transcribed sequences regardless of genomic organization, whereas exontrapping is unable to detect intronless genes, as it is dependent on splice junctions foridentification1’2.In addition, Gene Tracking is extremely sensitive, able to detect raretranscripts present at only a few copies per cell (I. Rommens, personal communication).134ALLELE HD CONTROLNo. % No. %A 7 15.9 68 30.8B 37 84.1 153 69.2TOTAL 44 100.0 221 100.0Table 6-2 Allele frequencies for 1.2kb Hindifi polymorphismX2 3.29p= 0.07df= 1135AI 1 2 4 620/251 a 321/21 18/45 18/46 20/31 21/49III1BI2634_18/35 35/41 18/35 35/41 17/24 21/42II21/24 17/42Figure 6-7. CAG repeat sizes for the two families with the Aluretrotranposition. CAG sizes greater than 36 repeats areassociated with Huntington disease.136After publication of the identification of the GT clones, a novel gene containing atrinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomeswas reported9. Sequence comparison with the GT clones revealed that GT 70 and GT 149(BiN 1A and BiN 2) are portions of this gene. GT 70 corresponds to nucleotides 835 to1601 which is 406 nucleotides 3’ to the trinucleotide repeat. GT 149 corresponds tonucleotides 5322 to 5930. Crosshybridization and hybridization to genomic DNA indicatedthat GT 63 overlaps with GT 70.At the time of this analysis, the proximal candidate region for HD spanned 2.2 Mb, withthe distal boundary defined by a crossover occurring between D4S98 and D4S43’°, andthe proximal boundary defined by recombination between D4S 10 and D4S 12511. In onefamily, identification of a recombination event between markers D4S 180 and D4S 127 inindividual 116 (Figure 7-4a) indicated that the mutation causing I{D was distal to the markerD4S 125. This reduced the candidate region by moving the proximal boundary at least 200kb towards the telomere.One eDNA clone, GT48, isolated by a direct cDNA selection procedure, detected a DNArearrangement in two families with HD. The genomic rearrangement was caused by Aluretrotransposition and occurred in a gene rich area, 5’ to the HD gene, approximately 190kb from the putative start site of the Huntington gene (Figure 6-8).cDNAs for GT48 and the two other adjacent clones, GT44 and GT49, were not detecteddespite extensive library screening by others in the laboratory, suggesting the Alu has notinserted within a coding sequence. However, it is possible that there is a relationshipbetween Alu transposition and HD as the Alu retrotransposition segregates with HD inthese two families and was not seen in 1,000 control chromosomes. Furthermore, theCENTEL125V1QHDGENECAGII182GT48100KbFigure6-8.SchematicmapshowingthelocationoftheHDgeneandGT48withrespectto4p16.3markersandtheYACBINsusedinmapping.353G6r70D11BINSA187G12I112134I51 2A11BIN3ABCIDl02A10138rearrangement occurs on a core chromosomal haplotype which occurs in 2% of the generalpopulation where it is not associated with any rearrangement. The same Alu insertion hasnow been observed on the HD chromosome in 4 additional Scottish families, and has notbeen observed on 1000 control chromosomes of Scottish ancestry, supporting specificityof the Alu retrotranspositon event to HD chormosomes (J Warner, personalcommunication).The relationship between the insertion event and HD in these two families is not clear.Affected individuals with the Alu insertion demonstrate CAG repeat length consistent withHD and show no unusual clinical features accountable to the Alu retrotransposition event.Alu transposition has been previously shown to cause disease by interruption of exonicsequence as for the factor VIII gene5 and the cholinesterase gene12. Alternatively, thepositioning of an Alu element within an intron of a gene can affect processing of theprimary transcript as seen in the NF 1 gene6. The Alu element identified here is 5’ to theHD gene, 190 kb from the putative initiation codon. It is possible that an Alu insertioncould interfere with upstream regulatory sequences, such as enhancers, and affectexpression of the gene. However, the demonstration of CAG expansion in these twofamilies, identical to that of other HD families without Alu transposition, suggests that theAlu retrotransposition event is not causative of disease.Dr. G. Hutchinson analyzed the specific nucleotide variations in each Alu element thatallows their classification into subfamilies based on the extent of divergence from the Aluconsensus sequence’3. The Mu element in the two HD families has the internal 7 bpduplication at bp250, as well as the 7 other single nucleotide changes diagnostic for the Sb2subfamily’3.The locations of five other Sb2 Alu elements are known, however, only oneis known to disrupt gene activity by insertion and inactivation of the cholinesterase gene’2.The Alu element identified in this chapter does not have the 5 diagnostic nucleotide139substitutions found in the further subdivided PV subfamily which have been associatedwith de novo mutations in the Factor IX gene’4and the neurofibromatosis type 1 gene’5.Sequencing of a 58kb contig on 4pl6.3, around the marker D4S98, demonstrated a largenumber of Alu repeats, with an average Alu density of 1.0 Alu per kb’6. The presence ofthis Alu element may be a result of the preferential Alu insertion in Alu-rich areas.However, whether the Alu is a contributing cause to CAG instability, or is an effect ofCAG instability resulting in broad chromosomal instability and activation of transposingelements, or whether it is located there by chance alone has yet to be determined.1406.4 REFERENCES1. Duyk GM, Kim S, Myers RM, Cox DR (1990). Exon trapping: A genetic screen toidentify candidate transcribes sequences in cloned mammalian genomic DNA. Proc NatiAcad Sci USA 87:8995-8999.2. Buckler AJ, Chang DD, Graw SL, Brook JD, Haber DA, Sharp PA, Housman DE(1991). Exon amplification: A strategy to isolate mammalian genes based on RNAsplicing. Proc Natl Acad Sci USA 88:4005-4009.3. Lovett M, Kere I, Hinton LM (1991). Direct selection: A method for the isolation ofcDNAs encoded by large genomic regions. Proc Nati Acad Sci USA 88:9628-9632.4. Parimoo 5, Patanjali SR, Shukia H, Chaplin DD, Weissman SM (1991). eDNAselection: Efficient PCR approach for the selection of cDNAs encoded in largechromosomal DNA fragments. Proc Natl Acad Sci USA 88:9623-9627.5. Rommens J, Lin B, Hutchinson GB, Andrew SE, Goldberg YP, Glaves ML, GrahamR, Lai V, McArthur J, Nasir 3, Theilmann I, McDonald H, Kalchman M, Clarke LA,Shappert K, Hayden MR (1993). A transcription map of the region containing heHuntington disease gene. Hum Mol Genet 2:901-907.6. Goldberg YP, Lin B-Y, Andrew SE, Nasir J, Graham R, Glaves ML, Hutchinson G,Theilmann J, Ginzinger DG, Shappert K, Clarke L, Rommens JM, Hayden MR (1992).Cloning and assessment of the c-adducin gene close to D4S95 and assessment of itsrelationship to Huntington disease. Hum Mo! Genet 1:669-675.7. MacDonald ME, Novelletto A, Lin C, Tagle D, Barnes G, Bates G, Taylor 5, Allito B,Altherr M, Myers R, Lerach H, Collins FS, Wasmuth 11, Frontali M, Gusella iF (1992).The Huntington’s disease candidate region exhibits many different haplotypes. NatureGenet 1:99-103.8. Daniels, GR and Deininger, PC (1985). Integration site preferences of the Alu familyand similar repetitive DNA sequences. Nucl Acids Res 13:8939-8954.9. Huntington Disease Collaborative Research Group (1993). A novel gene containing atrinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes.Cell 72:971-983.10. Snell RG Thompson LM, Tagle DA, Holloway TL, Barnes G, Harley HG, SandkuijlLA, MacDonald ME, Collins FS, Gusella JF, Harper PS, Shaw DJ (1992). Arecombination that redefines the Huntington disease region. Am J Hum Genet 5 1:357-362.11. Bates GP MacDonald ME, Baxendale 5, Youngman S, Lin C, Whaley WL, WasmuthJJ, Gusella IF, Lehrach H (1991). Defined physical limits of the Huntington disease genecandidate region. Am J Hum Genet 49:7-16.12. Muratani K, Hada T, Yamamoto Y, Kaneko T, Shigeto Y, Ohue T, Furuyama 3,Higashino K (1991). Inactivation of the cholinesterase gene by Ala insertion: Possiblemechanism for human gene transposition. Proc Natl Acad Sci USA 88:11315-11319.14113. Hutchinson GB, Andrew SE, MacDonald H, Goldberg YP, Graham R, RommensJM, Hayden MR (1993). An Alu element retrotransposition in two families withHuntington disease defines a new active Alu subfamily. Nucl Acids Res 21:3379-3383.14. Vidaud D, Vidaud M, Bahnak BR, Siguret V, Gispert Sanchez S. Laurian Y, Meyer D,Goosens M, Lavergne JM (1993). Haemophilia B due to a de novo insertion of a human-specific Alu subfamily member within the coding region of the factor IX gene. Eur J HumGenet 1:30-36.15. Wallace MR, Anderson LB, Saulino AM, Gregory PE, Glover TW, Collins FS(1991). A de novo Alu insertion results in neurofibromatosis type 1. Nature 353:864-866.16. McCombie WR, Martin-Gallardo A, Gocayne 3D, FitzGerald M, Dubnick M, Kelley3M, Castilla L, Liu L, Wallace 5, Trapp 5, Tagle D, Whaley WL, Cheng 5, Gusella 3,Frischauf A-M, Poustka A, Lebrach H, Collins FS, Kerlavage AR, Fields C, Venter JC(1992). Expressed genes, Alu repeats and polymorphisms from chromosome 4. NatureGenet 1:348-353.142CHAPTER 7THE RELATIONSHIP BETWEENTRINUCLEOTIDE (CAG) REPEAT LENGTH ANDCLINICAL FEATURES OF HUNTINGTON DISEASEThe work in this chapter has contributed to two publications.Andrew SE, Goldberg YP, Kremer B, Telenius H, Theilmann I, Adam S, Starr E,Squitieri F, Lin B, Kalchman MA, Graham RK, Hayden MR (1993). The relationshipbetween trinucleotide (CAG) repeat length and clinical features of Huntington’s disease,Nature Genet 4:398-403.Goldberg YP, Andrew SE, Clarke LA, Hayden MR (1993). A PCR method for accurateassessment of trinucleotide repeat expansion in Huntington disease. Hum Mol Genet6:635-636.1437.1 INTRODUCTIONThe Huntington Disease Collaborative Research Group isolated a novel gene containing atrinucleotide repeat (CAG) that is expanded on HD chromosomes. The CAG repeat islocated at the 5’ end within the putative coding region. The gene is located betweenD4S 180 and D4S 127, and produces two transcripts of 10 and 13 kb encoding a protein of348 kd with no known homology to any other known protein.In the initial report, this highly polymorphic CAG repeat, located in the 5’ region of thegene, was shown to range from 11 to 34 copies on normal chromosomes, while in theaffected individuals analyzed, the CAG repeat had expanded to beyond 42 repeats, withthe largest expansion observed of 100 trinucleotides.The description of this trinucleotide expansion demonstrates that dynamic mutations arenow associated with at least seven human genetic diseases including fragile X syndrome(FRAXA)2’3,X-linked mental retardation (FRAXE)4,myotonic dystrophy (DM)57,Xlinked spinal and bulbar muscular atrophy (SBMA)8,spinocerebellar ataxia (SCA)9,dentato-rubro-pallido-luysian atrophy (DRPLA)’°”1and HDreviewed in 12-16•Prior studies of these diseases have demonstrated a strong relationship betweentrinucleotide expansion and clinical severity in offspring. For example, in DM there is anassociation between an increase in repeat length and earlier clinical onset of disease17t9.Furthermore, trinucleotide expansion has also provided a molecular basis for theanticipation seen in the transmission of DM’7’9. This is also seen in Fragile X whereCCG repeat length correlates with the risk of expansion associated with mentalretardation12.In SBMA, the age of onset and age of clinical milestones such as difficultywith stair climbing, decreased with increasing lengths of the CAG repeat20. Although thecorrelation between disease severity and CAG repeat length was demonstrated, factors144other than the trinucleotide repeat appear to contribute significantly to the SBMA diseasephenotype20.Genetic and environmental factors have both been invoked to account for the variation inage of onset of HD in different families. Studies of monozygous twins, however, haveclearly shown by the high concordance of age of onset that genetic factors play a majorrole21’2. The purpose of this analysis was to assess the relationship between CAG repeatlength and the clinical features of HD. In this chapter the expanded allele of the HDpatient is referred to as the “upper allele” and the normal sized allele is referred to as the“lower allele”.7.2 RESULTS7.2.1 DEVELOPMENT OF A PCR ASSAY FOR ASSESSMENT OF THE CAGREPEATIn the initial report of the CAG associated with HD, the Huntington DiseaseCollaborative Research Group detected CAG trinucleotide repeat expansion usingprimers HD1 and HD2 which span the CAG repeat as well as two separate adjacent CCGtrinucleotide repeats (Figure 7-1). This highly repetitive DNA, together with its high GCcontent renders this region difficult for PCR analysis. In order to reduce the GC contentof the PCR product and to produce a smaller product for more accurate sizing of the CAGexpansion, PCR conditions were developed using a new primer set (HD 344 and HD 5)eliminating one of the CCG repeats from the amplification product (Figure 7-1). Thisamplification allowed accurate determination of the length of CAG repeat expansion onboth the normal and the disease alleles. PCR was established with primers HD344 (5’CCTTCGAGTCCCTCAAGTCCTTC 3’) and HD5 (5’ CGGCTGAGGCAGCAGCGG3’). PCR conditions were 3 mM MgC12, 50 mM KC1, 20 mM Tris pH 8.4 , 2%formamide, 100 mM dATP, 100 mM dCTP, 100 mM CITTP, 75 mM 7-dAza-dGTP, 25145HD344 rHD15’ 3’HD5 “ HD2HD482(CAG)(CCG)Figure 7-1. Primers for amplification of the trinucleotide repeat. Primers HD1 andHD2 are as previously published*. HD1 = 5’ATGAAGGCCflCGAGTCCCTCAAGTCCTC 3’. HD2 = 5’AAACTCACGGTCGGTGCAGCGGCI’CCTCAG 3’.Primers HD344 and HI) 5 were synthesized according to the published sequence:HD344 =5’ CCTTCGAGTCCCTCAAGTCCTC 3’ HD5 =5’ CGGCTGAGGCAGCAGCGGCGT 3’. PCR was further modified by using HD 344 together with H])482. HD482 = 5’ GGCrGAGGAAGCTGAGGAG 3’.*Hth1gton Disease Collaborative Research Group (1993). Cell 72:971-983.146mM 7-dGTP, 0.5 mM of each primer, 0.3 pmol of y 32P end-labeled primer and 1 .25U ofTaq DNA polymerase per 25 .L1 reaction. Thermal cycling conditions were 95 °C for 5mm, followed by 30 cycles of 94 °C for 1 mm, 63 °C for 1 mm, 72 °C for 1 mm, with afinal extension at 72 °C for 7 mm. PCR products were resolved on 6% polyacrylamidegels and scored against an Ml 3 sequencing ladder. The CAG trinucleotide length wasdetermined by subtracting 110 bp of non-CAG containing DNA from the PCR size.PCR across the CAG repeat was subsequently further optimized by design of a newprimer (primer HD482 5’ GGCTGAGGAAGCTGAGGAG 3’) used in combination withprimer HD344 (Figure 7-1). Formamide (2%) and glycerol (15%) were also found to beessential for improved PCR across this GC rich region to achieve both specificity andgood product yield23. Numerous other parameters were tested to improve the PCRincluding 0-10% DMSO, TMAC, Gelatin, Triton X-100 and BSA without anyimprovement in the product. PCR conditions were 2 mM MgCl2, 50 mM KC1, 20 mMTris pH 8.4, 3.5% formamide, 15% glycerol, 200 mM of each dNTP, 0.5 mM of eachprimer, 0.3 pmol of y 32P end-labeled primer and 1 .25U of Taq DNA polymerase per 25j.tl reaction. Thermal cycling conditions were 95 °C for 3 mm, followed by 30 cycles of94 °C for 1 mm, 63 °C for 1 mm, 72 °C for 1 mm, with a final extension at 72 °C for 7mm. The CAG trinucleotide length was determined by subtracting 76 bp of non-CAGcontaining DNA from the PCR size.Six percent denaturing polyacrylamide gels were found to be best for resolving andaccurately sizing the trinucleotide repeat in HD patients (Figure 7-2). Fractionation ofPCR products on non-denaturing polyacrylamide gels gave good resolution but did notallow for accurate sizing. To detect PCR products ‘y 32P end-labeling of HD5 resulted inless background compared to x 32P dCTP incorporation during thermal cycling. PCRacross repetitive regions of DNA often results in multiple band patterns of variableTNR147— ——— ———_4 I.—I —Laddersize (bp)— 270— 260—250— 240—230—220—210—200— 190— 180—170— 160—150Figure 7-2. PCR amplification of DNA from HD patients using primersHD344 and HD5. PCR products were resolved on a 6% denaturingpolyacrylamide gel, dried and subjected to autoradiographyovernight. TNR = number of trinucleotide repeats. TNR =(PCR size - 110 ) I 3.53—49—47—44—42—I018—148intensity creating difficulties in accurately sizing the repeat expansion. The pattern of thestutter is, however, consistent from gel to gel. To size the polymorphic alleles the darkestintensity band (generally the second largest band of the stutter) was scored against anM13 sequencing ladder (Figure 7-2). The CAG trinucleotide length was determined bysubtracting 76 bp or 110 bp of non-CAG containing DNA from the PCR size, dependingon the primer set used. Failure to detect an expanded allele necessitates Southern blotanalysis to distinguish between a normal individual homozygous for a normal allele andan affected patient with an expansion too large for analysis by PCR.7.2.2 THE ASSOCIATION OF TRINUCLEOTIDE EXPANSION AND AGE OFONSETA total of 366 persons (259 unrelated families) constituted the cohort for this study(Table 7-1). Age of onset of HD was known for all persons while information concerningthe clinical presentation, including involuntary movement disorder, psychiatric andcognitive dysfunction was available in a subset. In this cohort, age of onset has beendefined as the age at which the first clearly defined abnormality was apparent includinginvoluntary movements, psychiatric or cognitive abnormalities, or inability to performcomplex hand movements as manifested by clumsiness.Using both PCR methods described, both HD and normal alleles were detected in 98.4%of the cohort (360/366) (Table 7-2). The exact cause for failure to detect an expandedallele in these 6 persons reflects either technical failure, patients with HD without anupper allele, normal persons homozygous for the lower alleles, persons misdiagnosed asHD or sample mix-up. These 6 individuals did not have any clinical characteristics thatmight have suggested that their exclusion would bias the subsequent analysis of theremaining 360 patients and were deleted from this analysis.149TABLE 7-1: Demographic and Clinical Data of Study CohortN=360Sex distribution:Male 191Female 169Age of onset:Mean 41.5±12.4Range 5-85Affected Parent (by age group):Age N Mother Father Unknown0-20 20 7 13 021-40 167 75 82 1041-60 149 64 68 1761-80 23 7 9 7>81 1 0 0 1Total Cohort 360 153 172 35150TABLE 7-2: CAG repeat length in cohortN Median RangeLength of lower allele: 360 20 11-37Length of upper allele: 360 44 38-121Length of upper allele:Age of onset N Median Range0-20 20 56.5 46-12121-40 167 46 38-7541-60 149 43 39-5261-80 23 42 40-44>81 1 43Group differences and pairwise comparisons of upper allele length are allsignificant (p.<0.001), except for comparison between 41-60 and 61-80year groups (N.S.).151The median of the size of the upper allele representing the expanded HD chromosomewas 44 repeats with a range of 38-121 repeats while the range for the non-HD allele was11-37 repeats with a median of 20. The range of CAG repeat length of 140 controlchromosomes from individuals with no relationship to affected persons was 10 to 31,with a median of 17 repeats (Table 7-2).To examine the relationship between age of onset and CAG length, linear regression wasused, with logarithmic transformation of the age of onset, allowing the treatment of anexponential function as an intrinsically linear model. A highly significant correlation (n= 360, r = 0.70, p = 10-v) was evident for the relationship between length of trinucleotiderepeat and age of onset in the whole cohort accounting for approximately 50% of the totalvariation in age of onset (r2 = 0.49)(Figure 7-3). The regression curve was derivedaccording to the formula: in [age of onset] = 5.3379 - 0.0363 x [trinucleotide size]. Whenanalysed by age of onset in 20 year intervals, the 0-20 and 21-40 groups had significantlygreater trinucleotide repeat lengths compared to the older age of onset groups (Table 7-2).A similar assessment was also made of the association between trinucleotide expansionand age of onset of other clinical features. A highly significant correlation was seenbetween CAG expansion and age of onset of chorea (n = 122, r = 0.611, p = 10),clumsiness (n = 69, r = 0.636, p = l0), dementia (n = 37, r = 0.577, p = 10’s), andpsychiatric signs (psychosis/depression/severe behavioral problems) (n = 84, r = 0.518, p= 106).Prior studies have shown earlier onset in offspring of affected males2’8. In our cohort,those with an affected father had a lower age of onset (1 year) than those with an affectedmother (p <0.001)(Table 7-3). The presence of juvenile onset patients may have beenresponsible for this finding, as their exclusion from this analysis resulted in no differencein onset age between offspring of an affected father compared to an affected10080N=360ar—.70”r2_aI.49a•a.piO7>plI.a...___a.•a60-a....•.a....a)a.....a.U)..••sa.a....aa..o:i:::a.a...a•a.a•1aa•..ao40-a....aaaaI..a•aaa...aaaI.aaG)aaa.a...•Ua•aa......a•a...aa.a•a..a.aaaaaaaa.a•aaaaaaaaaaaaaaa.a.a20—..aaaaa.Iaaaaaa0IIIII354555657585CAG-repeatlengthFigure7-3.AgeofonsetbylengthofCAGrepeat.Theregressioncurvewascalculatedonlogtransformedageofonsetdata.Onepatientwithonsetatage5andaCAGlengthof121repeatsisnotshownasthiswasoffscale.153TABLE 7-3: Length of upper and lower allele by sex of parent and grandparentAffected parent/grandparent N Length of upper allele Mean age of onsetMedian Range (± standard deviation)Total cohort:Mother 153 44 39-65 41.1±11.3 *Father 172 44 38-121 40.1±12.7 *Juvenile onset:Mother 7 57 49-61 17.7±3.5Father 13 53 46-121 16.0±4.0Total cohort:Mother/grandmother 46 45 40-58 39.4±11.7Mother/grandfather 44 44.5 39-65 39.4±9.6Father/grandmother 67 45 40-70 35.2±1 0.2Father/grandfather 32 45 38-121 36.4±12.8Juvenile onset:Mother/grandmother 4 56.5 52-58 19.3±1.0Mother/grandfather 1 49 19Father/grandmother 8 54 46-78 16.4±2.6Father/grandfather 3 68 48-121 13.7±7.8* p<0.001 (determined by t-test)No other differences are significant154mother. This confirms the sex of parent effect long observed in HD families. Thedifference in onset age depending on the sex of the affected parent was not however,reflected in the size of the upper allele, which did not differ significantly between the twoparental groups (Table 7-3). Neither the grandparental origin of the gene, nor theparentlgrandparent pattern of inheritance of the mutant allele was associated with anydifferences in upper allele length for the group as a whole (Table 7-3). No significantdifferences were seen in age of onset with maternal or paternal inheritance when analyzedby non-parametric Wilcoxon and Kruskall-Wallis tests (data not shown).The age of death of persons with RD is a specific point in time and is not subject to thesame sources of potential bias as other measurements of the natural history of thisdisorder such as age of onset. A significant correlation between the length oftrinucleotide expansion and the age of death of the patient with HD was determined (n =51, r = 0.423, p 0.01). This parameter may also have been subject to some bias ofascertainment in that longer survivors in this particular cohort would be still alive and notbe included in this particular analysis.7.2.3 THE CORRELATION BETWEEN TRINUCLEOTIDE LENGTH ANDPRESENTATION OF DIFFERENT CLINICAL FEATURESAn assessment was made of the relationship between the length of trinucleotideexpansion and the presentation of the major clinical feature at diagnosis. The cohort wasdivided into those who either had chorea, psychiatric disturbance (psychosis ordepression), dementia, or rigidity as the major presenting feature. In each group therewas no association, independent of age of onset, between repeat length and a particularclinical presentation.1557.2.4 THE VARIATION IN TRINUCLEOTIDE EXPANSION IN PERSONS WITHJUVENILE ONSETData were available on 20 persons with juvenile onset HD. In this group the size of theexpansion ranged from 46 to 121, with median of 56.5 repeats. One patient with onset ofdisease at age 5 had an expansion of 121 trinucleotide repeats. The remainder had repeatsizes of 78 or lower, and of these, six were in the range of patients with adult onsetdisease. Regression analysis of age of onset on CAG length revealed a significantcorrelation (r = -0.79; r2 = 0.62; n = 20; p < 10-). In this small sample, no significantdifferences in repeat length could be found between those who inherited a mutant alleleof paternal compared to maternal origin (Table 7-3). Similarly, parentallgrandparentalorigin of the mutant allele was not correlated with CAG repeat length (Table 7-3).7.2.5 THE PREDICTIVE VALUE OF TRINUCLEOTIDE EXPANSION IN THEDETERMINATION OF AGE OF ONSETThe significant correlation between the age of onset and the size of the trinucleotiderepeat expansion raised the possibility that age of onset could be predicted based ontrinucleotide length (Figure 7-4). To test this, a random numbers generator was used todivide the whole cohort (n = 360) into two smaller groups. The first group (n = 190) wasused to recalculate a regression equation and confidence limits for prediction while thesecond group (n = 170) was used to test these confidence limits (test sample). Theregression equation obtained from the 190 cases was: ln [age of onset] = 5.4053 - 0.0377x [trinucleotide size], and was similar to the equation obtained using the whole cohort (r= -0.64; r2 = 0.41; p < 107)(Figure 7-3). The model generated using the first groupdescribed the test sample, since 94.1% (160/170) fell within the range predicted by the95% confidence limits and 97.6% (166/170) fell within the range predicted by the 99%confidence limits.10080a‘a‘a..:a.an..—ea...waaaa••..,•aS....::•IS•a’..40d’.a.....aa20-I:.995..::...a•“•....2.5Y.05Y0IIIIII354555657585CAG-repeatlengthFigure7-4.ConfidenceintervalsforpredictedagegiventhesizeoftheCAGrepeat.Outercurvesdelineatethe99%confidenceintervalwhileinnercurvesshowthe95%confidenceinterval.157However, at the lower end of the range of CAG repeats there are very broad confidencelimits for age of onset predictions (Figure 7-4). For example, with a trinucleotide repeatlength of 45, the expected age of onset would be 41 with 95% confidence limits ofbetween 25 and 66 years. As the trinucleotide expansion increases to over 50, however,these limits narrow. A trinucleotide size of 60 would give an expected age of onset of 23with 95% confidence limits of 14 and 38 years. However, it should be noted that only asmall percentage (4.1%) of the total cohort had repeat lengths of 60 or greater.7.2.6 THE PRECISION OF ASSESSMENT OF TRINUCLEOTIDE EXPANSIONA critical issue in the development of the predictive model is the accuracy andreproducibility of assessment of trinucleotide expansion using the methods described. Atotal of 37 persons were chosen randomly and assessed in two separate PCR experiments.Of these 37, 11 had exactly the same size expanded allele in both analyses. Themaximum differences varied however, between -3 to +3. The mean of the differences ±SD for the upper allele was 0.6 ± 1.9. The size of the lower allele differed by not morethan 2 repeats for 97% (36/37) of subjects with 19 persons having no difference (mean ±SD = 0.3 ± 1.2). This size difference occurred on comparison of the PCR products fromthe same DNA samples sized on different gels.To examine the consequences of this error on the results of the regression analysis, thedata on CAG length were varied within the limits of the original dataset in a randommanner according to a normal distribution using the appropriate standard deviation (1.9).Repeat regression analysis with the new, randomly displaced CAG lengths resulted incomplete agreement with the results of the previous analysis. Thus it is apparent thatusing this PCR approach, the relatively minor differences in reproducibility of resultshave no significant impact on the predicted estimates for ages of onset.1587.2.7 PARENT-CifiLD CORRELATIONS OF TRINUCLEOTIDE EXPANSIONHaving demonstrated a significant association between age of onset and repeat length, itwas hypothesized that differences in onset ages within families would be explained bydifferences in repeat length. Both parent-child and sib pairs were examined.Within this cohort of 360 people there were 25 affected parent-child pairs including 4 inwhich the child had juvenile onset. The correlation between repeat length in affectedparent and child was not significant (r = 0.33, p = 0.10, n = 25). No difference in repeatlength was seen in 1 parent-child pair, while 16 affected children had an increase ofbetween 1 and 4 repeats. Four offspring had an increase of between 8 and 20 while 1offspring of juvenile onset had an increase of 74 repeats. In this instance, a father with atrinucleotide expansion length of 42 had a child with onset age 5 and trinucleotideexpansion of 121, representing the largest expanded allele in this cohort. In two otherparent-juvenile onset pairs analysed, the difference in trinucleotide expansion betweenthe parent with adult onset and child with juvenile onset was 8 and 13 trinucleotidesrespectively (Figure 7-5). Two children had a decrease of 1 repeat and 1 affected childhad a decrease of 2 repeats compared to their affected parent.Large differences in age of onset between parent and child however, did not alwaysparallel differences in trinucleotide repeat length. In this study, 8 parent-child pairs bothwho manifest adult onset disease showed anticipation of between 10 - 30 years but didnot have differences in trinucleotide repeat expansion greater than three. Therefore thetrinucleotide expansion between parent and adult onset offspring was not apparent evenwhen the difference in age of onset was similar to that seen between the parent andjuvenile onset child.15924mother affectedC_63540050• father affectedCAG-repeat length ParentFigure 7-5. Change in repeat length between 25 affected parentchild pairs. One parent-child pair with a difference of 74 repeats isnot shown as this was off scale. The sex of the affected parent isnoted.1607.2.8 SIB-SIB CORRELATIONS OF TRINUCLEOTIDE EXPANSIONA total of 48 pairs of affected siblings were included in this analysis (Figure 7-6). Thecorrelation between the siblings for trinucleotide expansion was significant (r = 0.66, p <0.001). No difference in repeat length was seen in 8 sib pairs while 18 sib pairs differedby only 1 repeat. Sixteen affected sib pairs had a difference of 2 to 6 repeats and theremaining 6 sib pairs had repeat length differences between 7 and 16 (Figure 7-6).Interestingly, in this latter group the father was always the transmitting parent.Nineteen pairs of siblings had differences of greater than 5 years in the age of onset,varying between 6 to 29 years. In 8 of these pairs of siblings there was no or only onedifference in the repeat length. In contrast, five pairs of siblings had ages of onset within5 years but did have repeat length differences ranging between 4 and 11 (Figure 7-6). Itis apparent therefore, that repeat length alone could not account for the observeddifferences in ages of onset between siblings.7.3 DISCUSSIONThe analysis of this cohort of 360 affected persons from 259 unrelated families with HDhas demonstrated a significant correlation between the number of CAG repeats and theage of onset. This association was present irrespective of the mode of clinicalpresentation at time of onset. The number of trinucleotide repeats in the upper alleleaccounted for approximately 50% of the variation in the age of onset. Repeat length,however, was not indicative of any other particular clinical phenotype as there was noindependent association between the major clinical features of the illness and the numberof trinucleotide repeats.Assessment of the repeat expansion in 25 parent-child pairs revealed no significantcorrelation. In contrast, the sib-sib correlation of repeat length was significant16124-C0) : mother affectedfather affected18a)_______________a)0.a)h.C.) Ia)C,C00I•1II IoV.0 0 ‘ o o.0-6 I I35 40 45 50 55 60CAG-repeat length Sib 1Figure 7-6. Change in repeat length between 49 pairs of affectedsiblings. The sex of the affected parent is noted.162(p <0.001). This is consistent with the previously reported observation of aggregation ofage of onset amongst siblings. The majority (2 1/25 parent-child pairs and 42/48 affectedsib pairs) showed small differences in CAG repeat length (<6). However, in those sixaffected sib pairs with greater differences in repeat length, the transmitting parent wasalways the father. This suggests that expanded CAG repeats inherited through the malegermline may be more likely to undergo significant expansion.In the assessment of three of four parent-child pairs where the child had juvenile onset(anticipation), this was associated with a significant increase in the trinucleotide repeatexpansion. An obvious increase in trinucleotide repeat expansion however, is not alwaysassociated with anticipation in terms of age of onset. Eight parent-child pairs wereidentified where the difference in age of onset between the parent and child was 10 yearsor greater, but where differences in trinucleotide expansion between parent and childwere 2 repeats or less. All of these offspring had onset between the ages of 23 and 54. Inaddition, five pairs of siblings demonstrated differences of 4 - 12 trinucleotide repeatlengths with no differences in age at onset indicating that moderate changes in repeatlength are not always associated with changes in age at onset.Within families, therefore, repeat length may on occasion show a significant increasewithout reported changes in age of onset. Conversely, there may also be obvious changesin age of onset with no measurable changes in repeat length. Thus it would appear,particularly in persons with adult onset of HD, that factors other than trinucleotide repeatexpansion may also play a significant role in the determination of age of onset.The identification of trinucleotide repeat expansion in this gene has many implicationsfor predictive testing programs. With regard to the predictability of phenotype based onlength of trinucleotide repeat expansion, it is evident that very broad ranges of predicted163age of onset can be derived based on the number of trinucleotide repeats. However, it isalso apparent that this would only be useful for a minority of persons (4.1%) at risk forHD who have repeat lengths greater than 60.Curves for ages of onset of offspring have previously been constructed which gaveestimations of age of onset in offspring based on age of onset in the parent29. As part ofprior counseling programs persons at risk have been informed that there is in general anaggregation of age of onset amongst siblings with less correlation of the age of onsetbetween parent and child. The specific estimates of age of onset with appropriateconfidence limits developed in the predictive model may add benefit to such counselingprograms for a small proportion of patients.The observation that 6 individuals diagnosed with HD do not show expansion of theCAG repeat has several implications on previous analyses that included these individuals.Further analyses of the individuals lacking expanded repeats is presented in Chapter 9and the effects of including results from these individuals in previous analyses, such aslinkage disequilibrium analyses or analysis of recombinant chromosomes, is discussed.The highly significant association between trinucleotide repeat length and clinicalfeatures of HD have now been confirmed by other studies of different populationsprimarily of British and Western European descent 3035•1647-4 REFERENCES1. Huntington Disease Collaborative Research Group (1993). A novel gene containing atrinucleotide repeat that is expanded and unstable on Huntington disease chromosomes.Cell 72:971-983.2. Yu 5, Pritchard M, Kremer E, Lynch M, Nancarrow J, Baker E, Holman K, Mulley JC,Warren ST, Schlessinger D, Sutherland GR, Richards RI (1991). Fragile X genotypecharacterized by an unstable region of DNA. Science 252:1179-1181.3. Kremer EJ, Pritchard M, Lynch M, Yu 5, Holman K, Baker E, Warren ST,Schiessinger D, Sutherland GR, Richards RI (1991). Mapping of DNA instability at thefragile X to a trinucleotide repeat sequence p(CCG)n. Science 252: 171 1-17 14.4. Knight SJL, Flannery AV, Hirst MC, Campbell L, Christodoulou Z, Phelps SR,Pointon J, Middleton-Price HR, Barnicoat A, Pembrey ME, Holland J, Oostra BA,Bobrow M, Davies KE (1993). Trinucleotide repeat amplification and hypermethylationof a CpG island in FRAXE mental retardation. Cell 74:127-134.5. Mahadevan M, Tsilfidis C, Sabourin L, Shutler G, Amemiya C, Jansen G, Neville C,Narang M, Barcelo J, O’Hoy K, Leblond 5, Earle-MacDonald J, de Jong P3, Wieringa B,Korneluk RG (1992). Myotonic dystrophy mutation an unstable CTG repeat in the 3’untranslated region of the gene. Science 255:1253-1255.6. Fu Y-H, Pizzuti A, Fenwick RG Jr, King 3, Rajnarayan 5, Dunne PW, Dubel 3, NassesGA, Ashizawa T, de Jong P, Wieringa B, Lorneluk R, Perryman MB, Epstein HF, CaskeyCT (1992). An unstable triplet repeat in a gene related to myotonic muscular dystrophy.Science 255:1256-1258.7. Brook JD, McCurrach ME, Harley HG, Buckler AJ, Church D, Aburatani H, Hunter K,Stanton VP, Thirion JP, Hudson T, Sohn R, Zemelman B, Snell RG, Rundle SA, Crow 5,Davies 3, Shelbourne P, Buxton J, Jones C, Juvonen V, Johnson K, Harper PS, Shaw DJ,Housman DE (1992). Molecular basis of myotonic dystrophy: expansion of atrinucleotide (CTG) repeat at the 3’ end of a transcript encoding a protein kinase familymember. Cell 68:799-808.8. La Spada AR, Wilson EM, Lubahn DB, Harding AE, Fischbeck KH (1991). Androgenreceptor gene mutations in X-linked spinal and bulbar muscular atrophy. Nature 352:77-79.9. Orr, HT, Chung M, Banfi 5, Kwiatowski TJ Jr, Servadio A, Beadet AL, McCall AE,Duvick LA, Ranum LPW, Zoghbi HY (1993). Expansion of an unstable trinucleotideCAG repeat in spinocerebellar ataxia type 1. Nature Genet 4:22 1-226.10. Nagafuchi S, Yanagisawa H, Sato K, Shirayama T, Ohsaki E, Bundo M, Takeda T,Tadokoro K, Kondo I, Murayama N, Tanaka Y, Kikushima H, Umino K, Kurosawa H,Furukawa T, Nihei K, Inoue T, Sano A, Komure 0, Takahashi M, Yoshizawa T,Kanazawa I, Yamada M (1994). Expansion of an unstable CAG trinucleotide onchromosome l2p in dentatorubral pallidoluysian atrophy. Nature Genet 6: 14-18.16511. Koide R, Ikeuchi T, Onodera 0, Tanaka H, Igarashi S, Endo K, Takahashi H, KondoR, Ishikawa A, Hayashi T, Saito M, Tomoda A, Miike T, Naito H, Ikuta F, Tsuji S(1994). Unstable expansion of CAG repeat in hereditary dentatorubral-pallidoluysianstrophy (DRPLA). Nature Genet 6:9-13.12. Fu YH, Kuhi DPA, Pizzuti A, Pieretti M, Sufcliffe JS, Richards S, Verkerk AJMH,Holden JJA, Fenwick RG Jr, Warren ST, Oostra BA, Nelson DL, Caskey CT (1991).Variation of the CCG repeat at the fragile X site results in genetic instability: Resolutionof the Sherman paradox. Cell 67:1047-1058.13. Richards RI and Sutherland GR (1992). Dynamic mutations: A new class ofmutations causing human disease. Cell 70:709-712.14. Hayden MR (1993). On planting alfalfa and growing orchids: The cloning of thegene causing Huntington disease. Clin Genet 43:217-222.15. Mandel JL (1993). Questions of expansion. Nature Genet 4:8-9.16. Nelson DL and Warren ST (1993). Trinucleotide repeat instability: when and where.Nature Genet 4: 107-108.17. Redman JB, Fenwick RG Jr, Fu YH, Pizzuti A, Caskey CT (1993). Relationshipbetween parental trinucleotide CTG repeat length and severity of myotonic dystrophy inoffspring. JAMA 269:1960-1965.18. Tsilfidis C, MacKenzie AE, Mettler G. Barcelo J, Korneluk RG (1992). Correlationbetween CTG trinucleotide repeat length and frequency of severe congenital myotonicdystrophy. Nature Genet 1:192-195.19. Hunter A, Tsilfidis C, Mettler G, Jacob P, Mahadevan M, Surh L, Korneluk R (1992).The correlation of age of onset with CTG trinucleotide repeat amplification in myotonicdystrophy. J Med Genet 29:774-779.20. La Spada AR, Roling DB, Harding AE, Warner CL, Spiegel R, HausmanowaPetrusewicz I, Yee W-C, Fishbeck KR (1992). Meiotic stability and genotype-phenotypecorrelation of the trinucleotide repeat in X-linked spinal and bulbar muscular atrophy.Nature Genet 2:301-304.21. Hayden M.R. (1981). Huntington’s chorea. Springer-Verlag, New York.22. Harper P.S. (1991). Huntington’s disease. W.B. Saunders, London.23. Sarkar G, Kapelner S. Sommer SS (1991). Formamide can dramatically improve thespecificity of PCR. Nuci Acids Res 18:7465.24. Merritt AD, Conneally PM, Rahman NF, Drew AL (1969). Juvenile Huntington’schorea. In: Progress in neurogenetics, Barbeau, A Brunnett, JR eds. Excerpta Medica,Amsterdam, pp 645-650.25. Farrer LA and Conneally PM (1985). A genetic model for age of onset in Huntingtondisease. Am J Hum Genet 37:350-357.16626. Myers RH, Madden JJ, Teague JL, Falek A. (1982). Factors related to onset age inHuntington’s disease. Am J Hum Genet 34:48 1-488.27. Adams P, Falek A, Arnold J (1988). Huntington disease in Georgia: age at onset. AmJ Hum Genet 43:695-704.28. Hayden MR, Soles JA, Ward RH (1985). Age of onset in siblings of persons withjuvenile onset Huntington disease. Clin. Genet 28:100-105.29. Stevens D.L. (1972). The heterozygote frequency for Huntington’s chorea. InHuntington’s chorea, 1872-1972. Barbeau A., Chase T.N., Paulson G.W. eds. RavenPress, New York, pp 191-198.30. Duyao M, Ambrose C, Myers R, Noveletto A, Persichetti F, Frontali M, Doistein 5,Ross C, Franz M, Abbott M, Gray I, Conneally P, Young A, Penney I, Hollingsworth Z,Shoulson I, Lazzarini A, Falek A, Koroshetz W, Sax D, Bird E, Vonsattel J, Bonilla E,Alvir J, Bickman Conde J, Cha J-H, Dure L, Gomez F, Ramos M, Sanchez-Ramos J,Snodgrass 5, deYoung M, Wexler N, Moscowitz C, Penchaszadeh G, MacFarlane H,Anderson M, Jenkins B, Srinidhi J, Barnes G, Gusella J, MacDonald M (1993).Trinucleotide repeat length instability and age of onset in Hntington’s disease. NatureGenet 4:387-392.31. Snell R, MacMillan JC, Cheadle JP, Fenton I, Lazarou LP, Davies P, MacDonaldME, Gusella JF, Harper PS, Shaw DJ (1993). Relationship between trinucleotide repeatexpansion and phenotypic variation in Huntington disease. Nature Genet 4:393-397.32. Telenius HT, Kremer HPH, Theilmann J, Andrew SE, Almqvist E, Anvret M,Greenberg C, Greenberg J, Lucotte G, Squitieri F, Starr E, Goldberg YP, Hayden MR(1993). Molecular analysis of juvenile Huntington disease: the major influence on(CAG)n repeat length is the sex of the parent. Hum Mo! Genet 2:1535-1540.33. Zuhlke C, Riess 0, Schroder K, Siedlaczck I, Epplen JT, Engel W, Thies U (1993).Expansion of the (CAG)n repeat causing Huntington disease in 352 patients of Germanorigin. Hum Mol Genet 2:1467-1469.34. Norremolle A, Riess 0, Epplen JT, Fenger K, Hasholt L, Sorenson SA (1993).Trinuceolide repeat elongation in the Huntingtin gene in Huntington disease patients from71 Danish families. Hum Mol Genet 2: 1475-1476.35. Stine OC, Pleasant N, Franz ML, Abbott MH, Folstein SE, Ross CA (1993).Correlation between the onset age of Huntington’s disease and length of trinucleotiderepeat in IT-15. Hum Mo! Genet 2:1547-1549.167CHAPTER 8SENSITIVITY AND SPECIFICITY OF CAGEXPANSION IN HUNTINGTON DISEASEThe work presented in this chapter has contributed to one manuscript.Kremer B, Goldberg YP, Andrew SE, Theilmann J, Telenius H, Zeisler J, Squitieri F,Lin B, Adam S. Benjamin C, Greenberg 3, Lucotte G, Almqvist E, Bird TD, SchellenbergGD, Bassett A, Aimqvist E, Bird T, Hayden MR. A worldwide study of the Huntingtondisease mutation: The sensitivity and specificity of measuring CAG repeats. NEJM330:1401-1406.1688.1 INTRODUCTIONPrior to the cloning of the gene for HD, the accuracy of predictive testing was limitedbecause of potential recombination between inherited DNA markers and the site of themutation for the disease. In addition, a significant proportion of persons were excludedfrom testing because of the unavailability of blood from family members. Prior to adefinitive test, diagnosis of HD in a symptomatic person was often complicated by thesubtlety of early signs and symptoms that often mimic other disorders. The discovery ofthe CAG repeat mutation in patients with HD or at risk for the disease adds diagnosticaccuracy to the clinical setting.However, certain limitations of the initial data published should be noted before using thisdata for diagnostic purposes. Most persons in the initial analyses were of WesternEuropean descent15. Prior to the routine utilization of the CAG marker as a marker forinheritance of HD it is important to ascertain the sensitivity for detection of affected personsin different countries of different ancestries. Furthermore the specificity of CAG expansionfor HD must be assessed.8.2 RESULTSDNA was analyzed from families living in Canada as well as families from many differentparts of the world including persons of European, Asian, Black African, Arab and NativeIndian descent. The ethnic origins were defined by the country from which the ancestorwith HD originated. The racial and geographic origins of these DNA samples from familieswith HD is shown in Table 8-1.169Table 8-1 Distribution of allele sizes according to country oforigin and ethnic backgroundNumber of Number of CAG size_________________________individuals pedigrees median (range)AFRICANS. Africa-Black 5 2 52 (39-55)ARABSyria 4 1 42 (38-45)Egypt 2 2 42,43Lebanon 2 1 41,42Saudi Arabia 3 1 47 (41-50)ASIANChina 8 5 49 (43-51)Japan 3 2 46 (45-51)Eastlndia 4 2 43 (41-59)CAUCASIANEuropeAustria 2 2 42,61Belgium 1 1 44Czech Republic 2 2 46,52Denmark 2 2 43,48Netherlands 28 13 43.5 (37-59)England 117 63 44 (38-63)France 45 31 44 (36-100)Germany 46 26 44 (40-65)Great Britain 74 38 44 (39-121)Greece 2 2 40,46Hungary 6 4 43.5 (40-49)Ireland 43 27 42 (39-52)Italy 52 33 44 (39-54)Latvia 1 1 41Lithuania 1 1 44Malta 2 2 50Norway 23 7 46 (39-71)Poland 8 6 44 (43-48)Romania 1 1 42Russia 22 11 41 (37-47)Scotland 91 47 43 (38-71)Sweden 103 46 43 (38-88)Ukraine 1 0 6 45 (40-49)Wales 3 3 46,47Fmr. Yugoslavia 6 2 43 (42-48)170Table 8-1 con’t.Number of Number of GAG size_________________________individuals pedigrees median (range)CAUCASIAN (CONT)North AmericaFrench Canada 35 17 45 (36-75)Mexico 2 2 47,53South AmericaChile 5 1 41 (39-45)Ecuador 3 2 47 (45-49)El Savador 1 1 43Venezuela 1 1 48Australia 1 1 44S. Africa-Caucasian 11 4 5 1 (42-57)S. Africa-Dutch 4 2 48.5 (47-53)S. Africa-French 3 1 52 (49-55)S. Africa-Indian 1 1 50S. Africa-Mixed 8 2 47 (44-57)NATIVE AMERICANNative American 1 3 3 48 (43-58)Metis 3 1 44 (41-46)UNKNOWN ORIGIN 181 131 43 (39-49)TOTAL: 43 995 565 44 (36-121)1718.2.1 CAG SIZES IN RD AND OTHER NEUROPSYCHIATRIC DISORDERS:After review of 1022 patients, 1005 had signs and symptoms compatible with a diagnosisof HD. 995 persons had CAG repeat lengths ranging from 36 and 121 with a median of 44repeats. The 995 affected persons were from 565 separate pedigrees and 43 differentancestries. No significant differences (ANOVA) in allele sizes in affected persons wereseen in persons of various ancestries, including 5 different racial groups. It is however,noteworthy that a trend to an increased median CAG size was seen in affected SouthAfricans of black and mixed ancestry as well as in persons with HD of Chinese, Japanese,Saudi Arabian and Native Indian descent. However, due to small sample size, noconclusions can be drawn. The increased mean CAG size in South Africans of mixedancestry is consistent with an increased frequency of juvenile and early onset HD in thispopulation6.In 10 persons, CAG repeat lengths were within the normal range (Figure 8-la). In thosepersons diagnosed with HD who had CAG repeat length less than 30, reassessment ofCAG repeat size confirmed initial results. In all these individuals, additional DNA wasrequested and repeat measurements of CAG length were performed where possible. Inaddition, the clinical records of these 10 individuals were re-examined and additionalrecords including where possible reports of neuropathological assessment were reviewed.These potential phenocopies are presented in Chapter 10.In addition, patients with a clear family history of familial Alzheimer’s disease (n=44),schizophrenia (n=47), neurocanthocytosis (n=2), benign hereditary chorea (n=5) andDentato-rubro-pallidoluysian atrophy (DRPLA) (n=2) were also assessed for CAGexpansion. The range of CAG repeat length is shown in Table 8-2. Clearly the range ofU)00II.0.0Ez(CAG)nFigure8-la.DistributionsofCAGrepeatlengthsondifferentchromosomes.Upperallele1005personswithclinicalsignsandsymptomsofHuntingtondisease.Atotalof995haveCAGrepeatlengthsof36orgreater.HDalleles:n=995median=44,range36-121150-100500N0111111liJilII102030401111115060I11111708090100110120130sizesinpersons173Table 8-2. The distribution of CAG repeat length in Huntington disease and otherneuropsychiatric disorders.Disease No of alleles GAG Repeat GAG Repeat GAG RepeatTested Range Mean Size Median SizeHuntington diseaseHDalleIes 995 36-121 45.3 44Alzheimer disease 88 12-24 1 8.8 1 9Schizophrenia 94 16-25 19.1 19Benign Hereditary Ghorea 1 0 1 6-23 1 8.7 1 8Neurocanthocytosis 4 17-20 18.5 18.5DRPLA 4 19-20 19.5 19.5Controls No of alleles GAG Repeat GAG Repeat CAG RepeatTested Range Mean Size Median SizeNon-Huntington alleles 995 10-36 1 9.1 1 9Intermediate alleles 1 2 30-35Alleles in HD range 1 36Control alleles 600 10-39 18.4 18174CAG repeat size in these disorders is similar to that seen on normal human chromosomesand shows no overlap with that seen in RD.8.2.2 CAG SIZES IN CONTROLSCAG expansion was also assessed in 300 control subjects of Caucasian, Black African andChinese descent in an effort to determine the range of CAG expansion in controlindividuals without a family history of any neuropsychiatric disorder. The normal CAGsize, as defined by 600 chromosomes from control subjects, ranged from 10 to 39 (Figure8-lb. Table 8-3). The distribution appeared to have a bimodal shape, with a CAG size of18 being relatively underrepresented as compared to peaks at 17 and 19 triplet repeats.In addition to these 600 control chromosomes, the 995 chromosomes from the affectedindividuals not containing the CAG repeat expansion were used to study the normal rangeof CAG repeats (Figure 8-ic). Of these, 983 (98.8%) had CAG size between 10 and 29;the previously determined normal range7.Again, the distribution appeared to be bimodal,with peaks at 17 and 19, and with a CAG repeat size of 18 being underrepresented.Comparisons between CAG lengths in controls of Caucasian, Black and Chinese descentreveal differences in CAG repeat distribution between Caucasian and both other groups(Table 8-3). However, even though these differences are statistically significant, they aresmall and will not have clinical relevance. In addition, 12 control chromosomes (unaffectedalleles in HD persons) of Caucasian descent were detected with CAG size between 30-35which represents a frequency of 0.75% of intermediate alleles size (lAs) in this population.One person with clinical signs and symptoms of RD also had two alleles in the RD range(37 and 43). This person had onset at age 50, with a clinical deterioration over 12 years,consisting of progressive chorea, cognitive decline, dysarthria, dysphagia, and150-controlalleles:n=600median=18,range10-397080Figure8-lb.Allelesizeson600controlchromosomesfrom300individuals.OneCAGalleleisintheHuntingtondiseaserangeat39repeatlengths.N100-5O0I0ii102030405060(CAG)n176Table 8-3. CAG size distribution for controlchromosomes of different ethnic originsOrigin No. Median (range)CAUCASIAN 226 19 (10-35) *#Australia 2 17,22France 2 19,20Germany 8 18 (10-26)Great Britain 32 18.5 (12-35)Ireland 6 20 (16-21)Italy 28 19 (15-29)Lithuania 2 12,12Netherlands 2 11,1 7Norway 10 19.5 (17-27)Poland 2 19,29Russia 8 18 (15-22)Scotland 20 19 (14-25)Sweden 102 19 (11-29)BLACK 112 17 (11-29) *CHINESE 1 0 1 7 (1 6-2 0) #ARAB 2 17,17* p=0.003# p=0.012non-HDalleles:n=995median=19,range10-37Figure8-ic.Lowerallelesizesofthe995affectedpersonswithexpandedupperalleles.Notethatoneallele,at37repeats,isintheHuntingtondiseaserange.C’,400I0Ez150-10050—0NH010Ii30405060(CAG)n7080178severe cachexia. Post-mortem examination revealed generalized brain atrophy (brain weight1100 g); marked caudate atrophy. Her father died unaffected in his forties while her mothermanifested with HD in her fifties. This patient, although genetically homozygous for theHD mutation, had features typical of a heterozygote for HD6.One control individual, a spouse of an affected person and herself without signs orsymptoms of the disease at age 57, had a CAG repeat length of 39 repeats on onechromosome. There was no history of HD in any of her ancestors. This was recheckedincluding analysis of offspring which confinned this fmding and also revealed the existenceof a homozygote for CAG alleles in the range of HD in one offspring. This person aged 25is currently asymptomatic and has CAG repeat lengths of 39 and 42 triplets respectively.8.3 DISCUSSIONThe purpose of this analysis was to demonstrate the sensitivity and specificity of the CAGtrinucleotide repeat length as a marker for inheritance of the HD gene. A total of 995/1005(99.0%) of persons who after review were considered to have a clinical diagnosis of HDwere shown to have significantly expanded CAG repeat lengths above the range seen onnormal human chromosomes. This was observed in affected persons from 43 differentcountries and 5 different racial groups including persons of Caucasian, Arab, BlackAfrican, Chinese, Japanese, and Native Indian descent. These results support thepreviously reported findings of the sensitivity of CAG expansion in a smaller group ofpresumed Caucasian patients8. In contrast, no CAG expansion was seen in otherneuropsychiatric disorders such as familial Alzheimer’ s disease, familial schizophrenia,benign hereditary chorea, or DRPLA with which HD has previously been misdiagnosed.CAG expansion therefore underlies the worldwide distribution of HD and suggests it isdirectly related to the causation of HD even though the mechanism by which this occurs isstill unknown. The high sensitivity and specificity of CAG expansion for the inheritance of179HD has significant implications both for assessment of symptomatic persons as well as forpredictive testing programs.For persons at risk for HD, a direct test for inheritance of the mutation will allowindividuals at risk to have a more accurate assessment of their genetic risk without the needfor DNA from family relatives. However, misdiagnosis and human error remain a sourceof error and the opportunity to assess an affected family relative will allow confirmationthat CAG expansion is present in other affected relatives. This will facilitate correctinterpretation of a normal sized CAG repeat length in someone at risk.Direct DNA testing will be particularly useful in symptomatic persons for whom the familyhistory of HD is uncertain and for whom the natural course of the illness has not beendocumented. Clearly the demonstration of an expanded CAG repeat within the HD gene isa highly specific marker for the inheritance of the gene for HD and can be used todifferentiate HD from other neuropsychiatric disorders which were commonlymisdiagnosed as HD in the past such as Alzheimer’s disease, schizophrenia,neurocanthocytosis, benign hereditary chorea and DRPLA. It is of note, that for somecases of DRPLA, CAG repeat size may represent the only means of differentiating thesetwo disorders during life9.Two previously unsuspected homozygotes for HD were identified by direct detection of theexpanded allele on both chromosomes. Prior reports utilised linked markers for inheritanceof the gene and were theoretically subject to error based on recombination between themarkers and the mutation10’1.However, the presence of a clinical phenotype andpathological findings in one affected person similar to a heterozygote for the mutation forHD and the absence of symptoms in an adult aged 25 who is a homozygote, supports thepreviously reported findings that the phenotype of the homozygote is not more severe than180the heterozygote and is consistent with CAG expansion conferring a gain of function in thepathogenesis of HD.A total of 12 CAG alleles (0.75%) with sizes between 30 and 35 were seen on controlchromosomes. Surprisingly, all of these were seen on the non-HD chromosomes ofaffected persons. However, other studies have shown that these intermediate sized alleles(lAs) exist in the normal population at a low frequency similar to that seen on the non-HDalleles in this analysis2”. New mutations for }{D arise from lAs when transmittedthrough the male germline (Chapter 10). The stability of these lAs on controlchromosomes is uncertain, but it is likely that these would represent the pool from whichnew mutations for HD arise.CAG expansion underlies the worldwide distribution of HD in persons of variousancestries and racial groups. In addition to being sensitive for indicating inheritance of HD,CAG expansion is also highly specific, not being seen in persons with otherneuropsychiatric disorders with which HD is frequently misdiagnosed.1818.4 REFERENCES1. Duyao M, Ambrose C, Myers R, Noveletto A, Persichetti F, Frontali M, Doistein S,Ross C, Franz M, Abbott M, Gray 3, Conneally P, Young A, Penney J, Hollingsworth Z,Shoulson I, Lazzarini A, Falek A, Koroshetz W, Sax D, Bird E, Vonsattel J, Bonilla E,Alvir J, Bickman Conde J, Cha J-H, Dure L, Gomez F, Ramos M, Sanchez-Ramos J,Snodgrass S, deYoung M, Wexier N, Moscowitz C, Penchaszadeh G, MacFarlane H,Anderson M, Jenkins B, Srinidhi J, Barnes G, Gusella 3, MacDonald M (1993).Trinucleotide repeat length instability and age of onset in Huntington’s disease. NatureGenet 4:387-392.2. Snell R, MacMillan JC, Cheadle JP, Fenton I, Lazarou LP, Davies P, MacDonald ME,Gusella JF, Harper PS, Shaw DJ (1993). Relationship between trinucleotide repeatexpansion and phenotypic variation in Huntington’s disease. Nature Genet 4:393-397.3. Telenius HT, Kremer HPH, Theilmann 3, Andrew SE, Almqvist E, Anvret M,Greenberg C, Greenberg J, Lucotte G, Squitieri F, Starr B, Goldberg YP, Hayden MR(1993). Molecular analysis of juvenile Huntington disease: The major influence on(CAG)n repeat length is the sex of the transmitting parent. Hum Mol Genet 2:1535-1540.4. Zuhike C, Riess 0, Schroder K, Siedlaczck I, Epplen JT, Engel W, Thies U (1993).Expansion of the (CAG)n repeat causing Huntington disease in 352 patients of Germanorigin. Hum Mol Genet 2:1467-1469.5. Norremolle A, Riess 0, Epplen IT, Fenger K, Hasholt L, Sorenson SA (1993).Trinucleotide repeat elongation in the Huntingin gene in Huntington disease patients from71 Danish families. Hum Mol Genet 2: 1475-1476.6. Hayden MR (1981). Huntington’s chorea. Springer-Verlag, New York.7. Andrew SE, Goldberg YP, Kremer B, Telenius H, Theilmann J, Adam 5, Starr E,Squitieri F, Lin B, Kalchman MA, Graham RK, Hayden MR (1993). The relationshipbetween trinucleotide (CAG) repeat length and clinical features of Huntington disease.Nature Genet 4: 398-403.8. MacMillan JC, Snell RG, Tyler A, Houlihan GD, Fenton I, Cheadle JP, Lazarou LP,Shaw DI, Harper PS (1993). Molecular analysis and clinical correlations of theHuntington disease mutation. Lancet 342:954-958.9. lizuka R, Hirayama K, Machara K. (1984). Dentato-rubro-pallido-luysian atrophy: aclinicopathological study. J Neurol Neurosurg Psychiat 47:1288-98.10. Wexler NS, Young AB, Tanzi RE, Travers H, Starosta-Rubenstein 5, Penney JB,Snodgrass SR, Shoulson I, Gomez F, Arrayo MAR, Penchaszadeh GK, Moreno H,Gibbons K, Faryniarz A, Hobbs W, Anderson MA, Bonilla E, Conneally PM, Gusella JF(1987). Homozygotes for Huntington’s disease. Nature 326:194-197.11. Myers RH, Leavitt J, Farrett L, Jagadeesh J, McFarlane H, Mastromauro CA, MarkRJ, Gusella 3 (1989). Homozygote for Huntington’s disease. Am J Hum Genet 45:615-18.18212. Zuhike C, Riess 0, Bockel B, Lange H, Theis U (1993). Mitotic stability and meioticvariability of the CAG repeat in the Huntington disease gene. Hum Mol Genet 2:2063-2067.183CHAPTER 9HUNTINGTON DISEASE WITHOUT CAGEXPANSIONThe work in this chapter has contributed to one manuscript.Andrew SE, Goldberg YP, Kremer B, Squitieri F, Theilmann J, Zeisler 3, Telenius H,Adam S, Almqvist E, Anvret M, Lucotte G, Stoessl AJ, Campanella G, Hayden MR(1994). Huntington disease without CAG expansion: Phenocopies or errors inassignment? Am 3 Hum Genet 54:852-865.1849.1 INTRODUCTIONThe prior reports of the relationship between trinucleotide repeat length and clinicalfeatures of HD each described a small number of persons who had been given thediagnosis of HD, but were found not to have CAG repeat sizes in the range seen inaffected persons’ 4. Accurate assessment of the reasons for the failure to demonstrateexpanded CAG repeats in those persons diagnosed with HD is critical in determining thesensitivity of CAG repeat length for the diagnosis of HD in symptomatic patients.Furthermore, this is also important for predictive testing programs where the detection ofCAG repeat length in the normal range may be mistaken as absolute proof that the personat risk will not develop signs and symptoms consistent with the phenotype of HO in thefuture.A total of 1022 individuals from 573 families of 43 different ancestries diagnosed withHO were assessed for CAG repeat length. Those persons who did not have expansion ofthe CAG repeat in the affected range were assessed more fully to determine the reasonsfor the failure to detect expanded CAG repeats in all such presumably affected persons.It is possible that on very rare occasions, genetic heterogeneity may underlie thepresentation of the HD phenotype.Other disorders associated with dynamic mutations provide support for multiplemutations being responsible for disease. Two patients with a phenotype characteristic ofthat observed for Fragile X that lack the cytogenetic expression of FRAXA and CCGexpansion have been reported5’6.In both cases mutations in the FMR-1 gene other thanexpansion of the CCG repeat in the 5’ UTR have been shown to be responsible for thefragile X syndrome. One patient was found to contain a de novo point mutation withinthe FMR- 1 gene and the other had a submicroscopic deletion of more than 2 Mb of DNA,encompassing the CCG repeat and the FMR-1 gene. However, although the fragile X185syndrome may demonstrate allelic heterogeneity, the fundamental disruption of FMR- 1 inall cases described to date makes this disorder genetically homogeneous with respect tolocus.9.2 RESULTSCAG repeat lengths were in the range of that seen in normal human chromosomes (10 to30 repeats) in 30 of 1,022 persons who had been given the diagnosis of HD (Tables 9-1,9-2 and 9-3). Clinical details were based on extensive records documenting neurologicalexamination and special investigations such as computerized tomography, positronemission tomography and autopsy records. In all instances, patient records werereviewed, including collaboration with the referring physician. Repeat PCR assessmentwas performed for those individuals without expanded CAG alleles, using both the initialDNA sample as well as a second independently obtained sample when available (7/30cases). However, this was only possible in 7/30 cases and in the remaining cases it is notpossible to determine if sample mix-up had occurred.The most common causes for the failure to detect CAG expansion in persons withsupposed HD represented errors in assignment (18 persons) including misdiagnosis (10persons), sample mix-up (6 persons) and clerical error (2 persons) (Table 9-1).9.2.1 ERRORS IN ASSIGNMENTHuman error accounted for 8 of these misclassifications (Tables 9-1 and 9-3). In 2 casesthe persons were at-risk for HD but were recorded as affected. In 6 individuals samplemix-up took place prior to assessment of the CAG repeat length. This may have occurredat any point from the time of blood withdrawal to the time of assessment of CAG repeatlength. PCR reassessment of additional blood samples revealed 3 of the 6 persons werenow found to have CAG repeat sizes in the range of affected persons. In another instance186Table 9-1: Reasons for Diagnosis of HD Without Expansion of the CAG RepeatTriplet (<37 Repeats) in 1022 Affected PatientsNO. %UNEXPLAINED LACK OF CAG EXPANSIONFamily History of Neurological Disease 8 0.8CAG Expansion in Other Family Members 1 0.1New Mutations 3 0.3Total 12 1.2ERRORS IN ASSIGNMENTMisdiagnosis 10 1.0Sample Mix-up 6 0.6Clerical Error 2 0.2Total 18 1.8Table9-2.UnexplainedlackofCAGexpansionPatientPCR1RepeatPCR1NewsampleAllelesizesAllelesizesAllelesizes116,1816,18NA216,1816,18NA317,2017,2017,20416,1916,1916,19517,1817,18NA618,1918,19NAClassificationFamilyhistoryofCommentsneurologicaldiseasePhenocopyPhenocopyPhenocopyPhenocopyPhenocopyPhenocopyN00719,2019,20NAPhenocopy819,2019,20NAPhenocopy917,2117,21NAPhenocopy1015,2115,21NAPhenocopy/NewMutation1117,1917,19NAPhenocopy?NewMutation1216,1616,1616,16Phenocopy/NewMutationYesSibof2YesSibof1YesCousinof4,autopsyofaffectedparentshowscaudateatrophyYesCousinof3,autopsyofaffectedparentshowscaudateatrophyYesRecombinant,previouslydescribed(Weberetal.,1992b)Yes4personswithCAGexpansioninfamily(Robbinsetal.,1989;Pritchardetal.,1992;Duyaoeta!.,1993)YesSibof8YesSibof7PossibleNoTypicalfeaturesofHD,normalPETNoNoTypicalfeaturesofHD,confirmedbyCTandPETNA=SecondDNAsamplenotavailableTable9-3.PatientErrorsofassignmentPCR1AllelesizesRepeatPCR1NewsampleClassificationFamilyhistoryofCommentsAllelesizesAllelesizesneurologicaldisease1314,2114,21NAMisdjagnosjsYesIpersonwithCAGexpansioninfamily,behaviourdisturbancemisdiagnosedasjuvenileHD1417,1717,17NAMisdiagnosisYes2personswithCAGexpansioninfamily,minormotorabnonnalities1516,1716,17NAMisdiagnosisYesAlcoholic,psychiatricdisturbance1616,2016,20NAMisdiagnosisYes4personswithCAGexpansioninfamily1718,2218,22NAMisdiagnosisYes3personswithCAGexpansioninfamily,neuropathologicalmisdiagnosisofHD1817,2017,20NAMisdiagnosisYes2personswithCAGexpansioninfamily,psychiatricdisturbance1917,2817,28NAMisdiagnosisYesDementia,probableAlzheimer’sdisease2019,1919,19NAMisdiagnosisNoAlzheimer’sdisease,confirmedbyautopsy2119,2019,20NAMisdiagnosisNoNocaudateatrophy(CT14yearsafterdiagnosis)2218,2118,21NAMisdiagnosisNoProgressivedementia&tardivedyskinesia,previoushaloperidoluse2317,2817,28NASamplemix-upYesMix-upwithatrisk”child2420,2220,22NASamplemix-upYesMix-upwithunaffectedsib2516,1716,1719,44Samplemix-upYes1personwithCAGexpansioninfamily2619,1919,1917,40Samplemix-upYes2personswithCAGexpansioninfamily2716,2016,2016,43Samplemix-upYesIpersonwithCAGexpansioninfamily2819,2019,20NASamplemix-upYes2personswithCAGexpansioninfamily,haplotypesshowsamplemix-up2911,16NDNAClericalerrorYesUnaffectedindividual3017,1917,1917,19ClericalerrorYesIndividual”atrisk”NA=SecondDNAsamplenotavailable;ND=Notdone189where an additional DNA sample was unavailable, detailed assessment of the familyincluding the parents and children with other highly polymorphic markers in the region,revealed that the blood sample analyzed was highly unlikely to have been derived fromthis patient. In two cases, DNA from an unaffected family member was mistaken forDNA from an affected individual in that family and represented sample mix-up.9.2.2 MISDIAGNOSIS (FIGURE 9-1) (TABLE 9-3)Misdiagnosis of HD represented the major cause (10 persons, 1.0%) for the finding oftwo normal alleles in persons with a presumed phenotype of HD (Tables 9-1 and 9-3). In7 instances, patients with behaviour, psychiatric or minor motor disturbances fromfamilies with HD were incorrectly diagnosed as having HD (patients 13-19). It isnoteworthy that in one instance (patient 17) initial neuropathological examinationconfirmed the diagnosis of HD. However, a second assessment, prior to knowledge ofCAG repeat length, clearly indicated that the pathology findings were not consistent withHD. It was noted that the positive family history of HD might have influenced the firstneuropathological assessment. In one instance, a patient with dementia as the onlypresenting feature in the absence of a positive family history of HD was diagnosed assuffering from HD (patient 20). One patient had a tremor and a gait ataxia which waslikely due to chronic alcohol intake (patient 15), while in another, dementia associatedwith tardive dyskinesia was induced by neuroleptic drugs (patient 22).9.2.3 ABSENCE OF CAG EXPANSION IN INDIVIDUALS FROM FAMILIESWITH CAG EXPANSION IN OTHER AFFECTED PERSONS (TABLES 9-2, 9-3)Ten persons with affected family members with CAG expansion and who had clinicalsigns which strongly raised the possibility of the diagnosis for HD were shown not tohave CAG repeat expansion. In 4 of these instances, sample mix-up accounted for thesefindings (Table 9-3, patient 25, 26, 27, and 28). In one instance (Table 9-2, patient 6), in190H14/218/401417/1716/171616/2017/4219/200I17/20 19/45 19/50Figure 9-1. Pedigrees of misdiagnosed HD cases. Patients are markedwith arrows and sizes of CAG repeats are shown.K18/22ML______25/50 17/43N‘i7/28191a family which has been described in detail on three previous occasions178 two CAGalleles within the normal range were detected. In three instances (Table 9-3, patients 13,14, 16, 17 and 18), persons from a family with proven HD who had minor motorabnormalities or behavioural disturbances were misdiagnosed as having HD.9.2.4 LOCUS HETEROGENEITY OF HD?Pedigrees of families containing individuals lacking CAG expansion not attributable tomisdiagnosis, sample mix-up or clerical error are shown in Figures 9-2. Haplotypes wereconstructed for families with a history of HD (patients 1, 2, 3 and 4) (Figure 9-2a, 9-2b).Haplotypes for patients 5 and 6 have been previously determined and are shown in Figure9-2c and 9-2d’7. Pedigrees of remaining individuals are shown in Figure 9-2e.A total of 10 persons from 6 different families had a positive family history of aprogressive neurological disorder associated with intellectual decline, chorea, andpersonality change but lacked CAG expansion (Tables 9-1 and 9-2). In two instances.affected individuals were siblings (Figure 9-2a; Figure 9-2e, Pedigree C) while in anotherinstance the 2 affected persons were first cousins (Figure 9-2b). Haplotype analysis with6 markers in one family (Figure 9-2b) demonstrated that 2 affected individuals (patient 3and 4) had not inherited the same chromosome 4 in this region and therefore, the causefor the HD phenotype in this family was unlikely to be a gene in this region ofchromosome 4. Haplotype analysis in the family of patients 1 and 2 (Figure 9-2a) showsthat both affected persons share identical haplotypes. An older sibling with 2 differentparental alleles demonstrates possible early signs of HD. If this individual is definitivelydiagnosed in the future with HD, then 4.pl6.3 can be excluded as the site responsible forthis phenotype in this particular family.1920--ElCCG 107 107CAG 2016 1820D4S127 3 6 6 3D4S227 86 84CCG 10 10 10 7 10 10 7 7 10 7CAG 1820 1816 1820 1620 1816D4S127 6 3 6 6 6 3 6 3 6 6D4S227 8 8 8 6 8 8 6 4 8 6Figure 9-2a. Pedigree for patients 1 and 2(indicated by arrow) including haplotypes showingeach have received identical parentalchromosomes. The eldest sibling, with differentparental alleles to patients 1 and 2, showspossible early signs of HD (shown hatched), whichif confirmed as HD in the future would excludethis region of chromosome 4 as responsible forthe phenotype in this family.CCG -CAG 19D4S127 4D4S95 AD4S227 AD4S133 11930)206AB1CCG - - 10 7CAG 1620 I 118 19D4S127 6 6 I 6 5D4S95--1 lB AD4S227 B BI lB BD4S133 1 1 1 143-CCG - - 77CAG 20 17 19 16D4S127 6 6 5 3D4S95 - - A AD4S227 B B B AD4S133 1 1 1 2Figure 9-2b. Pedigree for patients 3 and 4 (indicated byarrow) including haplotypes at 6 markers showing thatpatients 3 and 4 have received a different chromosome 4 fromtheir affected parents, excluding this region of chromosome 4as responsible for the phenotype in this family.t94CAG [] 17 14 18i] 17 17D4S127 121 1 1 11211 1 1D4S95 B A A A I B f f A AD4S227-c3A 13 I 4 1 4 131 I 3 2D4S227-cl3B 171 6 2 6171 I 2 7D4S227-E241 liii 7 3 7 hil I 3 11D4S227-E24CA 3 4 3 [J 5 25CAG 1718D4S127 1 1D4S95 A AD4S227-c3A 2 4D4S227-cl3B 7 6D4S227-E241 ii fT]D4S227-E24CA 2[1JFigure 9-2c. Previously published* pedigree for patient 5 (indicated byarrow) including haplotypes at various 4p1 6.3 markers. The affectedchromosome is shown boxed.* Weber B, Riess 0, Wolff G, Andrew S, Collins C, Graham R, Theilmann J,Hayden MR (1992). Nature Genet 2:216-222.1950-D4S144 A B B AD4S1O B A B AD4S125 B C A DCAG 319 17184p16.3 D4S95 2 1 2 1D4S115 A B C CD4S111 A B A AD4S90 AB BAD4S169 BA CC6D4S144 AD4S10 A AD4S125 C DCAG 19 18D4S95 1 1D4S115 B CD4S111 B AD4S90 B AD4S169 A CFigure 9-2d. Previously published*# pedigree for patient 6(indicated by arrow) including haplotypes at various 4p markers.The Huntington disease chromosome is shown boxed. Haplotypeanalysis of patient 6 initially suggested the gene was distal tothese three markers# however subsequent analysis with anadditional telomeric marker (D4S1 69) then excluded thetelomere*.# Robbins C, Theilmann J, Youngman S, Haines J, Altherr MJ,Harper PS, Payne C, Junker A, Wasmuth J, Hayden MR (1989). Am JHum Genet 44:422-425.* Pritchard C, Zhu N, Zuo J, Bull L, Pericak-Vance M, Vance JM,Roses AD, Milatovitch A, Francke U, Cox OR, Myers RM (1992). AmJ Hum Genet 50:1218-1230.196C’19/20 19/20 17/21F15/17 21/27 I— 10 4ii 615/21 17/19 17/19 20/21G16/16Figure 9-2e. Pedigrees of patients with unexplained lack of CAGexpansion. Patients are marked with arrows and sizes of CAGrepeats are given where DNA was available.197In a previously described patient (Patient 5)9, DNA marker analysis had indicated that themutation associated with HD was distal to the region containing the HD gene (Figure 9-2c). Similarily, patient 6, from a previously described family”7’8,does not share 4.pl6.3markers with other affected family members who demonstrate CAG expansion (Figure 9-2d). This would indicate that mutations in other genes outside this region of chromosome4pl6.3,are likely to be associated with the clinical phenotype very similar to HD.9.3 DISCUSSIONThe cardinal finding of this study of 1022 HD affected persons, is that 3.0% of persons(n = 30) with the diagnosis of HD initially had CAG repeat sizes within the normal range.After investigation, 18 of these persons were found to represent misclassificationsincluding misdiagnosis, sample mix-up or clerical error while 12 affected personsrepresent possible cases of HD-like symptoms not caused by CAG expansion of the HDgene.The assignment of a case as misdiagnosis could be arbitrary depending on how consistentthe symptoms were with the classical diagnosis of HD. In this analysis, however thedistinction was based on whether an alternative diagnosis was made because the patienthad a clinical and a neuropathological phenotype more suggestive of other knowndisorders such as benign hereditary chorea, DRPLA, inherited ataxia,neuroacanthocytosis or Alzheimer’s disease. Those patients with a phenotype similar toHD who do not fulfill criteria for these other known disorders and do not have CAGexpansion in the HD gene may therefore represent previously undescribed HD-likedisorders.Clinical misdiagnosis is not rare in HD families, accounting for 10 misciassifications inthis cohort (1.0%). Previously, it has been demonstrated that patients with HD may often198be misdiagnosed as suffering from other illnesses including schizophrenia andAlzheimer’s disease 10,11 This study shows that misdiagnoses of neurological symptomsas HD in families with a positive family history of HD are a significant source of error.This reinforces the need for caution before attributing all neuropsychiatric symptoms asl{D in a family with a positive history of this illness.In families where there is a history of neuropsychiatric illness in other relatives consistentwith an autosomal dominant mode of inheritance, the absence of CAG expansion in theHD gene does not exclude the possibility that the person at risk will not manifest withsimilar signs and symptoms. This clearly highlights the need where possible to includeDNA from an affected person in all analyses of at risk persons which will facilitate moreaccurate genetic counseling. If the affected relative has CAG expansion and the person atrisk does not, this would be reassuring that the person at risk will not develop signs andsymptoms of HD. However if the affected person does not show CAG expansion in theHI) gene, this might mean that there is another neurogenetic disorder in this family.Reassurance to the person at risk without CAG expansion that they would not developsigns and symptoms of a similar disorder would not be possible in this situation.In DRPLA, the phenotype may be similar to HD’2’4. These two illnesses may howeverbe differentiated by neuropathological examination, where the major involvement of theglobus pallidus, subthalamic nucleus and dentate nuclei in DRPLA distinguish it fromHD’5-’7and now by assessment for CAG expansion in two different novel genes’°”1”5.Recently, expansion of a trinucleotide repeat (CAG) on chromosome 12 was shown to beassociated with DRPLA’°-”. The assessment of CAG expansion in the DRPLA gene inpatients 1-22 demonstrated that all had CAG lengths within the normal range (7-23repeats) suggesting that none of the individuals in this series lacking CAG expansion atthe HD locus can be classified as DRPLA.199Human error accounted for 10 niisclassifications (1.0%) in this series: sample mix-upaccounted for the absence of detecting trinucleotide repeat expansion in 6 persons in theseries (0.6%) while clerical errors accounted for two misciassifications (0.2%). Thisstresses the importance, where possible, of obtaining additional samples for reassessmentof persons who have signs and symptoms compatible with the diagnosis of HD, but whodo not manifest CAG repeat expansion prior to concluding that they do not have repeatexpansion.After taking into account misassignments due to misdiagnosis, sample mix-up andclerical error, only 12 persons from 9 families represented unexplained cases lackingCAG expansion (Table 9-1). Construction of haplotypes within the families confirms if aparticular DNA sample is consistent with other family chromosomes. Therefore, thepossibility of sample mix-up being responsible for those cases in which a second DNAsample is unavailable is unlikely. Consistent patterns of inheritance however, do notnecessarily exclude sample mix-up between two members of one family. Unfortunately,unavailability of second DNA samples in many instances prevents definitive exclusion ofsample mix-up as an explanation for lack of CAG expansion. These cases remainunexplained until more clinical data or additional DNA samples are available.In 3 families where assessment was possible (patients 3-6), segregation analysis indicatedthat other mutations in the HD gene (ff15) leading to this phenotype are unlikely andlocus heterogeneity probably underlies the HD phenotype. The genetic cause for thephenotype similar to HD in these instances lies, in all likelihood, outside the HD gene. In8 patients (patients 1, 2, 7-12) from 6 families the possibility that another mutation in theHD gene other than CAG expansion is responsible for the HD phenotype has not beenexcluded. It is noteworthy, however, that on retrospective review of 11/12 patients200(patients 1 to 11), certain clinical features raised questions about the clinical diagnosis ofRD. These included absence of expected progression and failure to see characteristicchanges on CT or PET scan after many years of illness. However, due to the lack of analternative diagnoses in these cases, the diagnosis of HD has not been removed.In families with demonstrated typical CAG expansion there were initially 10 patients whodid not show expansion. Sample mix-up accounted for 4 of these findings (patients 25-28), incorrect diagnosis was assigned in 5 cases (patients 13, 14, 16, 17, 18) and theremaining patient (patient 6) represents an unexplained case. In this one remaininginstance, the natural history is not typical for HD and in retrospect the diagnosis in thispatient is now in question. Misdiagnosis or sample mix-up are the most likelyexplanations for these phenomena. Recent studies from our laboratory indicate that whilesomatic mosaicism is evident in HD, major differences in CAG repeat size do not occurin different tissues except in persons with juvenile onset and cannot be invoked toaccount for the clinical presence of normal CAG size in a patient with demonstrated CAGexpansion in other family members16.The occurrence of individuals with an HD phenotype and lacking CAG expansionquestions the hypothesis that the expansion of the CAG repeat is the mutation causingHD.To demonstrate a DNA alteration is the causitive mutation rather than merely aclosely associated marker, complete association of alteration and disease must beobserved, and the lack of the alteration in control chromosomes. CAG expansion isassociated with 99.0% of all HD individuals, and those lacking expansion have not yetbeen excluded as misdiagnoses, sample mix-up or some other error. In addition,expansion into the affected range has only been seen in control individuals, thus itappears that the CAG repeat is causitive for the disorder. However, until animal modelshave demonstrated that the expanded repeat is responsible for disease, the suggestion that201the CAG may not be the primary mutation, but a closely associated marker cannot beentirely excluded.These findings have significant implications for the understanding of the pathogenesis ofHD. It would indicate that other mutational mechanisms besides CAG expansion,resulting in alteration of this gene rarely, if ever, lead to this phenotype. A total of 30persons given the clinical diagnosis of HD in this series of 1022 patients (3.0%) did nothave expansion of the CAG repeat. The majority of these (1.8%) were errors ofassignment. After investigation, a total of maximally 12 persons (1.2%) demonstrate anunexplained lack of CAG expansion. It is likely that in most instances, these patientscome from families with HD-like disorders. In at least 4 cases from 3 families, familystudies excluded mutations within the HD gene (IT 15) responsible for this phenotype.These results suggest that on rare occasions non-allelic genetic heterogeneity mayunderlie the presentation of an HD-lilce phenotype.2029.4 REFERENCES1. Duyao M, Ambrose C, Myers R, Noveletto A, Persichetti F, Frontali M, Doistein 5,Ross C, Franz M, Abbott M, Gray J, Conneally P, Young A, Penney J, Hollingsworth Z,Shoulson I, Lazzarini A, Falek A, Koroshetz W, Sax D, Bird E, Vonsattel J, Bonilla E,Alvir J, Bickman Conde J, Cha I-H, Dure L, Gomez F, Ramos M, Sanchez-Ramos J,Snodgrass S, deYoung M, Wexier N, Moscowitz C, Penchaszadeh G, MacFarlane H,Anderson M, Jenkins B, Srinidhi 3, Barnes G, Gusella J, MacDonald M (1993).Trinucleotide repeat length instability and age of onset in Hntington’s disease. NatureGenet 4:387-392.2. Snell R, MacMillan JC, Cheadle JP, Fenton I, Lazarou LP, Davies P, MacDonald ME,Gusella JF, Harper PS, Shaw DJ (1993). Relationship between trinucleotide repeatexpansion and phenotypic variation in Huntington’s disease. Nature Genet 4:393-397.3. Andrew SE, Goldberg YP, Kremer B, Telenius H, Theilmann J, Adam 5, Starr E,Squitieri F, Lin B, Kalchman M, Graham R, Hayden MR (1993). The relationshipbetween trinucleotide (CAG) repeat length and clinical features of Huntington disease.Nature Genet 2:398-403.4. Telenius H, Kremer HPH, Theilmann J, Andrew SE, Almqvist E, Anvret M,Greenberg C, Greenberg I, Lucotte G, Squitieri F, Starr E, Goldberg YP, Hayden MR(1993). Molecular analysis of juvenile Huntington disease: the major influence on(CAG)n repeat length is the sex of the affected parent. Hum Mol Genet 2:1535-1540.5. De Boulle K, Verkerk JMH, Reyniers E, Vits L, Hendrickx J, Van Roy B, Van DenBos F, de GraaffE, Oostra BA, Willems PJ (1993). A point mutation in the FMR-1 geneassociated with fragile X mental retardation. Nature Genet 3:31-35.6. Gedeon AK, Baker E, Robinson H, Partington MW, Gross B, Manca A, Korn B,Poustka A, Yu 5, Sutherland GR, Mulley JC (1992). Fragile X syndrome without CCGamplification has a FMR1 deletion. Nature Genet 1:34 1-344.7. Pritchard C, Zhu N, Zuo I, Bull L, Pericak-Vance MA, Vance JM, Roses AD,Milatovitch A, Francke U, Cox DR, Myers RM (1992) Recombination of 4.pl6 DNAmarkers in an unusual family with Huntington disease. Am J Hum Genet 50:1218-1230.8. Robbins C, Theilmann J, Youngman 5, Haines I, Altherr MJ, Harper PS, Payne C,Junker A, Wasmuth 3, Hayden MR (1989). Evidence from family studies that the genecausing Huntington disease is telomeric to D4S95 and D5S90. Am I Hum Genet 44:422-425.2039. Weber B, Riess 0, Wolff G, Andrew S, Collins C, Graham R, Theilmann J, HaydenMR (1992). Delineation of a 50 kb DNA segment containing the site of a recombinationevent in a sporadic case of Huntington disease. Nature Genet 2:216-222.10. Hayden MR (1981). Huntington’s chorea. Springer-Verlag, New York.11. Harper, PS (1991). Huntington’s disease. WB Saunders, London.12. lizuka R, Hirayama K, Maehara K (1984). Dentato-rubro-pallido-luysian atrophy: aclinico-pathological study. J of Neurol Neurosurg and Psychiatr 47:1288-1298.13. lizuka R and Hirayama K (1986). Dentato-rubro-pallido-luysian atrophy. HandbookClin Neurol 5: 437-443.14. Naito H and Oyanagi S (1982). Familial myoclonus epilepsy and choreoathetosis:Hereditary dentatorubral-pallidoluysian atrophy. Neurology 32: 798-807.15. Huntington Disease Collaborative Research Group (1993). A novel gene containing atrinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes.Cell 72:971-983.16. Telenius H, Kremer HPH, Goldberg YP, Theilmann I, Andrew S, Zeisler 3, Adam 5,Greenberg C, Ives EJ, Clarke LA, Hayden MR (1994). Somatic and gonadal mosaicismin Huntington disease: Instability of the CAG repeat is tissue specific and most prominentin brain and sperm. Nature Genet 6:409-414.204CHAPTER 10MOLECULAR ANALYSIS OF SPORADIC CASESOF HUNTINGTON DISEASEThe work in this chapter has contributed to two manuscripts.Goldberg YP, Andrew SE, Theilmann J, Kremer B, Squitieri F, Telenius H, Brown JD,Hayden MR (1993). Familial predisposition to recurrent mutations causing Huntington’sdisease: genetic risk to sibs of sporadic cases. J Med Genet 30:987-990.Goldberg YP, Kremer B, Andrew SE, Theilmann I, Graham RK, Squitieri F, Telenius H,Adam S, Sajoo A, Starr E, Heiberg A, Wolff G, Hayden MR (1993). Molecular analysisof new mutations for Huntington disease: Intermediate alleles and sex of origin effects.Nature Genet 5:174-179.20510.1 INTRODUCTIONNew mutations causing HD are exceedingly rare with the mutation rate amongst thelowest known for human genetic diseases1-3.This may reflect a truly low mutation rate,but may also reflect that proof of a new mutation in HD is difficult as the parents of thesporadic case must have lived beyond the expected age of onset of HD without anymanifestations of the disease, paternity of the sporadic case must be confirmed, andideally the disease should be transmitted to the offspring of the sporadic case4.CAG repeat size of 600 control chromosomes of unaffected persons of Caucasian descentis in a range from 10 to 29 with a median of 18 repeats. Furthermore, 97.5% of normalchromosomes contain less than 28 repeats. The length of the upper allele in a total of 955patients with HO has a range of 36 to 121 with a median of 44 repeats.Of the 1050 persons affected with HD (from 650 families), a total of 934 (89%)represented patients in whom there was an established family history of disease. Thosepatients without an established family history of the disorder (116, 11%) were subjectedto further investigation. The goal was to identify persons most likely to represent newmutations for HD. On further investigation of the 116 isolated cases, 56 were excludedby having either a family history that raised the possibility of HD, such as a first degreerelative with a serious neurological disorder or, where HD in parents could notconvincingly be excluded as a parent had died before the age of 60. An additional 3persons were excluded because they were adopted and 36 were excluded as no additionalinformation concerning family history was available. A total of 21 individuals remainedwho fulfilled the criteria of clearly having the signs and symptoms of HD and havingparents who had lived beyond the expected age of onset (>60). Of the prior reports ofpossible mutations, a total of only 9 such cases have previously been described in theliterature59.20610.2 RESULTS10.2.1 IDENTIFICATION OF THE HD PREMUTATIONCAG repeat length was assessed in 21 sporadic cases of HD and their families in order tolearn more about the molecular events underlying new mutations for HD. In 18 of thesefamilies the sporadic HD patient had an upper allele which was in the size range seen inpatients with this disorder (Table 1A and 1B). In all 8 families where DNA was availablefrom parents (n=5) or there were sufficient siblings to reconstruct genotypes in theparents (n=3), one parental allele was found to be significantly greater than that seen inthe general population (>30), but below the range seen in patients with HD. Alleles inthe range of 30-3 8 repeats have been designated as intermediate alleles (IA).These lAs are meiotically unstable. In each family the IA expanded in at least oneoffspring beyond 38 repeats causing HD (Figure 10-1 A-E). However, a small change inrepeat size (1-2 repeats) or no change at all was also seen in some unaffected sibs ofsporadic patients (Figure 10-1, Table 10-1). Since we have shown that the IA undergoesexpansion to the range seen in patients with HD, it was reasoned that the IA is apremutation allele which alone does not cause HD, but predisposes to further mutationwhich may eventually lead to HD.In one of the eight families (Figure 10-lA, Family 1) reconstruction of haplotypesindicates that one parent must have had an allele with approximately 32 repeats, whichwas then transmitted to 4 unaffected offspring as an allele with 32-33 trinucleotiderepeats. One offspring however, received an allele containing 43 repeats and developedHD. The lAs of the 4 sibs and the expanded allele of the sporadic patient were found byhaplotype analysis to be on the same chromosome as the parent with the IA, confirmingthat this IA was unstable and expanded to produce a CAG repeat length in the HO range.207Table 10-la. New mutation famiNes with demonstrated HD gene (GAG) repeat lengthbetween 30-38 (intermediate alleles).Family Patient Age of Clinical Status Transmitting Patemity** Allele sizeonset of Parents * Parent Lower UpperProband 36 Parents alive Unknown Confirmed 20 43Sib Unaffected >70 13 20Sib 18 33Sib 13 20Sib 13 20Sib 13 20Sib 13 20Sib 13 20Sib 20 33Sib 20 32Sib 18 332 Proband 36 Father Confirmed 17 44Father Father died >80 27 32Mother Mother alive >80 16 17Spouse 16 16Child 16 433 Proband 32 Father Confirmed 23 49Father Father alive >80 20 35Sib 17 35Sib 23 354 Proband 32 Father Confirmed 21 43Father Father alive >70 19 30Sib Mother alIve >70 19 305 Proband 35 Father Confirmed 16 53Father Father alive >70 18 37Mother Mother alive >60 16 16Sib 16 38Sib 16 37Sib 16 41Sib 16 18Uncle 17 376 Proband 28 Father Confirmed 22 52Father Father died >60 19 38Mother Mother alive >60 19 227 Proband 45 Unknown Confirmed 21 41Sib Parents died >60 21 38Sib Sibs >70s and no HD 16 36Sib 18 21Sib 21 378 Proband 40 Father Unknown 26 45Mother Father died >60 19 26Sib Mother alive >80 19 34* No parent had any clinical features of RD** Paternity was Investigated using highly polymorphic DNA markers and blood serologyIn all instances results were compatible with paternity as shown208Table 10-B. CAG repeat length and clinical details of new mutation familiesFamily Patient Clinical status Allele sizeof Parents * Lower Upper9 Proband Father died >60 16 47Mother Mother alive >80 16 18Child 16 20Child 16 2010 Proband Father died >70 17 47Child Mother alive >70 16 7111 Proband Father died>60 16 43Father-in-law Mother died>80 21 26Mother-in-law 17 21Child 17 4512 Proband Father died>70 20 40Sib Mother died>70 17 2013 Proband Parents alive >60 23 43Spouse 17 19Child 17 40Child 19 4114 Proband Father died >60 19 40Sib Mother died >70 12 1315 Proband Father died >70 11 39Spouse Mother died >70 18 22Child 15 18Child 22 3716 Proband Father died >60 16 43Child Mother died>90 16 1917 Proband Father died >90 15 42Mother died >8018 Proband Father died >60 20 40Mother alive >7019 Proband Father died >70 18 21Mother died >8020 Proband Father >60 13 26Mother Mother >60 22 28Father 16 1821.0 Proband Father died >60 15 19Sib Mother died >60 19 26* No parent had any clinical features of HO209TNR13181313131313202020182033202020202033324333ac bd ac ac ac ac ac ad ad ad/c bdFigure 10-1 (a-e).Autoradiograms showing trinucleotide repeat (TNR) length in families with new mutations. Ineach family, an intermediate allele (IA) is shown, which has expanded in the sporadic case.Pedigrees aie shown and in families 1,3 and 4, haplotypes (a, b, c and ci) are depicted.Figure 10-la. Haplotypes were reconstructed in family 1 for the parental alleles. Meioticinstability is seen as variability of the intermediate allele amongst sibs. The proband has anexpanded allele consistent with HD.210TNR44—43—32—27—17—16-s16 1627171643 16324417I 92Figure 10-lb. In family 2, the intermediate allele of the parent (I-i) expands in the proband (II2) and is passed on to the next generation (rn-i).TNR2112O—-17—2023172335493535cd ac bc acFigure 10-ic. Haplotype analysis of family 3 shows that the intermediate alleles and theexpanded alleles occurs on the same chromosome. In contrast to the proband, sibs have inheritedthe intermediate allele in a stable manner.3523— *19IIIacad abFigure 10-id. Haplotype analysis of family 4 shows that the intermediate alleles and theexpanded alleles occurs on the same chromosome. In contrast to the proband, sibs have inheritedthe intermediate allele in a stable manner.21243_]30_g1:19 2130 43213TNRFigure 10-le. Parents were unavailable for family 5. However, analysis of sibs shows anunstable intermediate allele of 36-38 repeats. These sibs are all alive (>70 years old) withoutfeatures of HD.41—21-$118—16—2116 182121383621 3741214Interestingly, this patient was one of the recombinant individuals discussed in Chapter 1,with the breakpoint between D4S 111 and D4S 141 that led to the hypothesis that therecombination event might underlie the cause of HD in this patient’°.In 2 other families with established haplotypes (Figures 10-iC and 10-iD, and Table 10-1A, Families 3 and 4) the expanded CAG repeat of the sporadic patient occurs on thesame chromosome as the IA of the sibs and the transmitting parent. In another interestingfamily, the father with the premutation has passed on an expanded allele to two offspring,only one of whom has already manifested with HD (Table 10-lA, Family 5).10.2.2 PARENTAL SEX OF ORIGINThe sex of origin of the premutation was determined by examining DNA from bothparents. In 5 instances the father has been identified as the transmitting parent (Table 10-1 A). In two families where parental DNA was unavailable, the sex of the transmittingparent could not be ascertained (Table 10-lA). Furthermore, in two additional familieswith sporadic HD, the mutation was not inherited from the mother, and therefore paternalorigin is implied (Table 10-lB. Family 8). Therefore, in 7/7 families there is preferentialorigin of the new mutation from the paternally derived allele. The probability ofobtaining 7 mutant alleles from the father in 7 independent meiosis, if there is no biastowards one parent or another is 0.0075. This would suggest that the paternal allele in thepremutation range is more likely to undergo significant expansion to a repeat length inthe range seen in patients affected with HD. The 5 unaffected fathers with lAs haverepeat lengths of their upper allele ranging from 30-38 (mean = 34.2 ± 2.8) which issignificantly different from the upper allele in the general population and less than theupper allele seen in HD. However, it should be noted that at the upper range of the IA itmay be difficult to distinguish between a high IA and an upper allele in the lower range(38-40) seen in some patients with HD.215In the remaining 12 patients with presumed new mutations causing HO (Table lO-1B),DNA from both parents was unobtainable and thus the parental origin of the expandedallele could not be identified. In 9 of 12 instances however, the person representing anew mutation for HD was found to have an expanded allele consistent with that seen inother affected persons.10.2.3 NEW MUTATIONS WITHOUT CAG EXPANSIONIn three families no expanded upper allele was detected although second DNA sampleswere unavailable to exclude sample mix-up. In these families, the repeat lengths of thesporadic patients were 18/21, 13/26 and 15/19 respectively. All three patients had typicalhistories of HD and were diagnosed by a neurologist as being affected. In anothersuspected new mutation, excluded from this study as the mother died before 60, thepatient with classical signs of HD detected clinically and by positron emissiontomography, had 2 alleles of 16 CAG repeat lengths. This would suggest therefore, thatin a proportion of these patients, genetic heterogeneity underlies HD. The underlyingcause for an HD-like phenotype remains unknown and merits further investigation.However, this has significant implications for predictive testing programs and clearlyindicates the importance of including affected family members in any protocol providingresults to at risk persons. The failure to identify an expanded allele in the affected personwould indicate that, in this instance, the direct mutation test would not be informative inthese offspring.10.2.4 SPORADIC HD IS TRANSMITTED TO OFFSPRINGPatients with sporadic HO can transmit their expanded CAG repeats to their offspringwho then will subsequently develop HD. In family 2, (Figure 10-1B, Table 10-lA) forexample, where paternity has been established, the unaffected father transmitted an IA216which expands to 44 repeats in the sporadic affected patient who in turn passed on anallele of 43 repeats to his offspring who also developed HD. In another family, there wastransmission of HD from the parent to his offspring with massive expansion of the HDallele from 47 to 71 repeats (Table 10- 1B, Family 9). In this family paternity has beenproven and both the parents lived to an advanced age, beyond 70 and 80 years oldrespectively, without manifesting signs and symptoms of F{D. This family is pertinentbecause the sporadic case developed HD at 30 years, while his child who inherited theexpanded allele of 71 repeats, manifested at 10 years of age consistent with thepreviously developed predictive model for age of onset based on repeat length.10.3 DISCUSSIONThese findings have major implications for the understanding of the sequential molecularevents leading to new mutations and also have clinical relevance. Patients with suspectedHD on clinical criteria without a positive family history are not rare and initiallyrepresented 11% of this sample which was biased towards ascertainment of familialcases. In this sample, 18 out of 21 sporadic cases of HD had CAG expansion. It wouldappear, therefore, in most instances that the diagnosis of HD can be made in the absenceof a positive family history.It has previously been postulated that the mutation rate for HD is amongst the lowest forall human genetic diseases and very few sporadic cases had been described3. The factthat 21 suspected new mutations in 650 families (3%) reflects the fact that the criteriaused here for designation of new mutations were less stringent than those used inprevious studies. Thus it is apparent from the results of this analysis that new mutationsare not as rare as previously thought, and it is likely that the mutation rate for thisdisorder has been underestimated.217These findings have significant implications for family members of sporadic cases, inparticular for siblings and second degree relatives of such affected persons. In the past,there was no appreciation that the offspring of unaffected siblings of a sporadic case forHD might indeed have an increased risk of manifesting with HD in the future since theymay also inherit an expanded allele (Table 10-lA, Family 5). Similarly, children ofunaffected siblings with an IA or male siblings with an IA are also at increased risk ofhaving children with HD. This latter risk would depend on whether the premutationallele undergoes expansion during transmission through the male germline. Femalesiblings who carry the premutation allele however, may have a lower probability ofpassing on an expanded allele resulting in HD in their offspring. Thus, the risk for thechildren of females carrying the premutation for HD would be considerably lower thanthat seen in the offspring of males.These findings may explain previously puzzling family histories. For example, twosiblings of unaffected parents may manifest with HD suggesting recessive inheritance.This is consistent with a premutation in the parental allele may have undergone expansionin both offspring. Furthermore, apparent skipping of generations may be due toexpansion of an inherited premutation in the gerniline of a sibling of a sporadic affectedpatient with consequent clinical manifestations in the niece or nephew. In the past, nonpaternity was usually invoked to account for these findings.The frequency of this expansion of premutation alleles in the male germ line is unknown.This small series which show 10 expansions out of 24 meioses (42%) clearly representsan overestimate as there has been a bias of ascertainment with identification only of thoseindividuals in whom the expansion has occurred and manifest with HD.218To date, only two other HD new mutation families have been analyzed and consistentwith these findings, lAs were detected in both families of 33 and 36 repeatsrespectively”. This analysis demonstrates that there is a premutation allele which isunstable, and expands to a range associated with HD. Haplotypes of the premutationalleles with markers flanking the CAG mutation demonstrate multiple origins for thepredisposing premutation allele, leading to HD chromosomes with different haplotypes(data not shown). This is consistent with the finding of multiple haplotypes associatedwith HD12’3In fragile X syndrome, a CCG repeat located in the 5’ untranslated region of the FMR-1gene ranges in normal persons from 6 to 54 repeats, while expansions greater than 200repeats are seen in affected individuals14J5.Phenotypically normal transmitting maleshave an intermediate size CCG repeat of 52 to 100 which is meiotically unstable(premutation)’3”4.However, in contrast to fragile X where expansion of the premutationonly occurs in transmission through the female germline, expansion of the HDpremutation has only been demonstrated in the male germline.In myotonic dystrophy (DM) a CTG in the 3’ untranslated region of the myotonin kinasegene ranges from 5-40 repeats in normal persons, while expansions greater than 100 areseen in affected individuals168.Similar to HD, in DM a premutation (50-100 repeats) ismeiotically unstable and expands to the full mutation, but in contrast to HD the expandedallele may be transmitted from either parent19.It has previously been shown in myotonic dystrophy and fragile X syndrome that thelength of the repeat is the major source of recurrent DNA mutations once the repeat hasreached an intermediate range1317. These data also implicate CAG repeat length as one219of the factors contributing to instability once the number of repeats reaches a thresholdlevel (30 repeats).In this analysis, mutations causing HD are more likely on the paternally derived allele.The selective expansion of the CAG triplet repeat in the offspring of males might reflect apreviously recognized higher mutation rate in the male than in the female germ line3.Errors in DNA replication during germ-cell division might be more likely as the numberof germ-cell divisions per generation is much greater in males than females. It is notablethat the fathers who clearly had the IA with germline mutations leading to an expandedallele, were of an advanced age (mean 36.7, range 29-55 years) at the time of birth of theaffected offspring. This is similar to the mean age of parents in other autosomaldominant conditions such as achrondroplasia and Marfan’s syndrome where a highermutation rate with increasing paternal age has been demonstrated3’2023.These findingssuggest that in HD, advanced paternal age in some undetermined way is influencing thesusceptibility of the premutation to full expansion.The sporadic recombinant individual from family 1 with the breakpoint between D4S 111and D4S 141 that led to the hypothesis of a distally located HD gene, disrupted in therecombinant individual resulting in disease10, shows the same CAG expansion to theaffected range as in other sporadic cases examined. Therefore, the recombination eventwould appear to be unassociated with the development of disease in this individual.These results provide convincing evidence for a premutation in HD. Factors affecting thesusceptibility of the premutation to full expansion include the sex of the parent as well aspaternal age. These findings have significant clinical implications for family members ofsporadic patients and will influence counseling practices.22010.4 REFERENCES1. Hayden, MR (1981). Huntington’s Chorea, Springer-Verlag, New York.2. Harper, PS (1991) Huntington’s Disease, WB Saunders, London.3. Vogel F and Motuisky A (1986). Human Genetics, 2nd ed., Springer-Verlag, NewYork.4. Stevens D and Parsonage MJ (1969). Mutation in Huntington’s chorea. I NeurolNeurosurg Psychiatry 32:140-143.5. Wolff G, Deuschl G, Wienker TF, Hummel K, Bender K, Lucking C, Schumacher M,Hammer J, Oepen G (1989). New mutation to Huntington’s disease. I Med Genet 26:18-27.6. Baraitser M, Burn J, Fazzone TA (1983). Huntington’s chorea arising as a freshmutation. J Med Genet 20:459-460.7. Shaw M and Caro A (1982). The mutation rate to Huntington’s chorea. J Med Genet19:161-167.8. Chiu E and Brackenridge CJ (1976). A probable case of mutation in Huntington’sdisease. J Med Genet 13:75-77.9. Stevens D and Parsonage M (1969). Mutation in Huntington’s chorea I. NeurolNeurosurg Psychiatr 32:140-143.10. Weber B, Riess 0, Wolff G, Andrew S, Collins C, Graham R, Theilmann I, HaydenMR (1992). Delineation of a 50kb DNA segment containing the recombination site in asporadic case of Huntington’s disease. Nature Genet 2: 2 16-222.11. Huntington Disease Collaborative Research Group (1993). A novel gene containing atrinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes.Cell 72:971-983.12. MacDonald ME, Novelletto A, Lin C, Tagle D, Barnes G, Bates G, Taylor 5, AllittoB, Altherr M, Myers R, Lehrach H, Collins FS, Wasmuth JJ, Frontali M, Gusella JF(1992). The Huntington’s disease candidate region exhibits many different haplotypes.Nature Genet 1:99-103.13. Andrew SE, Theilmann I, Almqvist E, Norremolle A, Lucotte G, Anvret M, SorensonSA, Turpin JC, Hayden MR (1993). DNA analysis of distinct populations suggestsmultiple origins for the mutation causing Huntington disease. Clin Genet 43:286-294.14. Fu Y-H, Kuhl DPA, Pizzuti A, Pieretti M, Sutcliffe IS, Richards 5, Verkerk AJMH,Holden JJa, Fenwick RG Jr, Warren ST, Oostra BA, Nelson DL, Caskey CT (1991).Variation of the CGG repeat at the fragile X site results in genetic instability: Resolutionof the Sherman paradox. Cell 67:1-20.22115. Kremer EJ, Pritchard M, Lynch M, Yu S, Holman K, Baker E, Warren ST,Schlessinger D, Sutherland GR, Richards RI (1991). Mapping of DNA instability at thefragile X to a trinucleotide repeat sequence p(CCG)n. Science 252:1711-1714.16. Mahadevan M, Tsilfidis C, Sabourin L, Shutler G, Amemiya C, Jansen G, Neville C,Narang M, Barcelo J, O’Hoy K, Leblond 5, Earle-MacDonald 3, de Jong PJ, Wieringa B,Korneluk B (1992). Myotonic dystrophy mutation: an unstable CTG repeat in the 3’untranslated region of a candidate gene. Science 255:1253-1255.17. Fu Y-H, Pizzuti A, Fenwick RG Jr, King J, Rajnarayan 5, Dunne PW, Dubel J,Nasser GA, Ashizawa T, de Jong P, Wiereinga B, Komeluk R, Perryman MB, EpsteinHF, Caskey CT (1992). An unstable triplet repeat in a gene related to myotonic musculardystrophy. Science 255: 1256-1258.18. Brook JD, McCurrach ME, Harley HG, Buckler AJ, Church D, Aburatani H, HunterK, Stanton VP, Thirion JP, Hudson T, Sohn R, Zemelman B, Snell RG, Rundle SA, Crow5, Davies 3, Shelbourne P, Buxton 3, Jones C, Junoven V, Johnson K, Harper PS, ShawDJ, Housman DE (1992). Molecular basis of myotonic dystrophy: Expansion of atrinucleotide (CTG) repeat at the 3’ end of a transcript encoding a protein kinase familymember. Cell 68:799-808.19. Richards RI and Sutherland GR (1992). Dynamic mutations: A new class ofmutations causing human disease. Cell 70:709-7 12.20. Penrose LS (1955). Parental age and mutation. Lancet 11:3 12.21. Penrose LS (1957). Parental age in achondroplasia and mongolism. Am J Hum Genet9: 167-169.22. Murdoch J, Walker BA and McKusick VA (1972). Parental age effects on theoccurence of new mutations for the Marfan syndrome. Ann Hum Genet 35:331-336.23. Vogel FA (1977). probable sex difference in some mutation rates. Am J Hum Genet29:312-319.222CHAPTER 11A POLYMORPHIC CCG REPEAT ADJACENT TOTHE CAG REPEAT IN THE HUNTINGTONDISEASE GENEThe work presented in this chapter contributed to one manuscript.Andrew SE, Goldberg YP, Theilmann J, Zeisler J, Hayden MR (1994). A CCGpolymorphism adjacent to the CAG repeat in the Huntington disease gene: Implications fordiagnostic accuracy and predictive testing. Hum Mol Genet 3:65-67.22311.1 INTRODUCTIONSince the discovery of the CAG trinucleotide expansion associated with HD, different PCRapproaches have been taken to assess CAG length’4. PCR across this repetitive region iscomplicated by the presence of two adjacent CCG trinucleotide repeats which in the initialreport were included in the PCR to assess CAG length’. In the initial report, three cloneswere sequenced across this region and each found to contain 7 CCG repeats adjacent to theCAG suggesting that this CCG repeat was not polymorphic’. Therefore all reportedmethods have used primers which not only encompass the CAG repeat, but also flank theadjacent CCG repeat in normal human chromosomes (Al/A2) (Figure 11-1). In thischapter, the possibility that inclusion of the adjacent CCG repeat may affect trinucleotidelength was examined.11.2 RESULTSCAG lengths reported in this thesis were obtained by using primers which encompass theCAG repeat and the adjacent CCG repeat in normal human chromosomes (A1/A2) (Figure11-1). Results of such PCR are more accurately “estimates of CAG size” as theseassessments of CAG repeat size have been made assuming that the CCG repeat did notdemonstrate any variation.In an effort to address whether the CCG is polymorphic or not, two additional sets of PCRprimers were designed, one which flanks specifically the CAG tract alone (C1IC2), whilethe other set encompasses the CCG repeat alone (B1JB2) (Figure 11-1).PCR conditions using primers B 1 and radiolabeled B2 were 2 mM MgC12, 50 mM KC1,20 mM Tris pH 8.4, 3.5% formamide, 15% glycerol, 200 mM dNTPs, 10 pmol of eachprimer and 2.5U of Taq polymerase. Thermal cycling conditions were 95°C for 3 mm,followed by 30 cycles of 94°C for 1 mm, 59°C for 1 mm, 72°C for 1 mm, with a finalFigure11-1.5’regionoftheHuntingtondiseasegeneshowingthepolymorphicCAGandCCGrepeats.Sequenceisshownextendingfromnucleotide(nt)335to488accordingtothepublishedsequence.PrimersusedforthedeterminationoftheestimatedCAG(Al/A2),CAGalone(Cl/C2)andCCGalone(Bi/B2)areindicatedbyarrowsabovethecorrespondingsequence.ThoseprimerswithanXhaveanadditional5’ Xholsiteasatail.(Al)HD344(Cl)HD344XA2B2GATGAAGGCCTTCGAGTCCCTCAAGTCCTTCCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCnt335(CAG)nHD482X(B2)(Bl)HD419X_________________AGCAGCAGCAGCAACAGCCGCCACCGCCGCCGCCGCCGCCGCCGCcTCCTCAGCTTCCTCAGCCGCCGCGnt 488AlClpIBiCAGIICCGI-HD447X(C2)HD482(A2)(CAG)n(CCG)n225extension at 72°C for 7 mm. PCR products were resolved on 6% polyacrylamide andproduct sizes were determined by comparison to an M13 sequencing ladder and the numberof CCG trinucleotides was calculated as follows:Number of CCG = (PCR product size -60)/3, as there are 60 nucleotides of non-CCGrepeats in the PCR product.The CCG triplet is seen as 7 copies on approximately 67% of normal human chromosomes(Table 11-1). However, this may vary between 7 and 12 repeats with the second highestfrequency of repeat being 10 (Table 11-1, Figure 11-2). In contrast, analysis of 114 HDpatients indicates that the majority (92%) of persons with HD have 7 CCG repeatsassociated with the expanded CAG repeat. Three patients however, were homozygous fora CCG repeat length of 10 indicating that the CAG in these instances was segregatingtogether with a CCG trinucleotide of 10 repeats (Table 11-1). In a further six patients withCCG allele sizes of 7 and 10 respectively, the CCG of 10 was found to segregate with theexpanded CAG on the HD chromosome.11.3 DISCUSSIONThe finding of a polymorphic CCG repeat has significant implications for the assessment ofCAG repeat length in persons with symptoms suggestive of HD and also for candidatesparticipating in predictive testing programs. If laboratories continue to use primers whichencompass both the CAG and the CCG repeats (“estimated CAG”), then in persons whohave an “estimated CAG” repeat length of between 37 to 42, it will be necessary todistinguish between CAG and CCG repeat length in order to accurately assess thecontribution of CAG expansion alone.In the past, using primers Al and A2 in a patient who for example, had a total PCR productsize of 45 repeats, subtraction of 7 CCG repeats would give an “estimated CAG” size of226Table 11-1. Frequency of CCG alleles in control and HD chromosomes.CCG allele CONTROL HDn % n %7 137 66.83 105 92.179 5 2.44 0 0.0010 61 29.76 9 7.8311 1 0.49 0 0.0012 1 0.49 0 0.00Total 205 100 115 100X26.36, pO.OOOO3, df=4Figure11-2.PCRAmplificationoftheCCGrepeatshowingallelesof7,9,10,11and12repeats.PCRproductshavebeenresolvedon6%polyacrylamidegelsandsizedagainstanM13sequencingladder.22838. In this cohort of 1022 affected persons, the smallest HD allele has an expanded CAGrepeat length of 36 repeats. This repeat length of 38 therefore, is within the range seen inaffected persons with HD implying that this person has an expanded CAG repeat consistentwith having inherited the mutation underlying HD. However, if such a person in fact had aCCG repeat length of 12, the actual CAG repeat length size would be only 33 and thereforewould be below the range seen in affected persons with HD. In this particular instance, themeasurement of CCG repeat length is critical in reaching an accurate conclusion in terms ofconfirmation of diagnosis or provision of an accurate risk in predictive testing.The CCG polymorphism may complicate assessment of CAG size, however, in a smallnumber of instances. Up until now, the approach to determination of trinucleotideexpansion was to estimate CAG length by PCR encompassing both CAG and CCG repeatsand subtracting 7 CCG repeats as it was thought not to be polymorphic. Even though in thevast majority of persons with HD (92%) direct assessment of CCG length will yield aresult of 7 CCG repeats, one cannot assume that this is always the case. Moreover,approximately 33% of normal individuals have a CCG greater than 7 repeats, andtherefore, in those instances in which measurement of CCG might have influence on theestimate of risk (for persons with “estimated CAG” between 37-42), direct assessment ofCCG length becomes imperative in an effort to give the patient the best estimate of whetherthey have or have not inherited a mutation associated with HD. Thus, in those instanceswith CAG length estimates between 37-42, CAG and CCG analysis would be performedindependently. Alternatively, in all patients an initial PCR across the CAG alone wouldcircumvent the need for additional PCRs. However, at present, this PCR is much lessrobust than amplification of the “estimated CAG”, and requires further optimization prior toroutine use.229Accurate assessment therefore, of CAG repeat size, clearly, is important both for thecorrect evaluation of symptoms in persons with an “estimated CAG” size of 37-42, as wellas for persons at risk who have not inherited the HD chromosome but have 10-12 CCGrepeats which otherwise might falsely lead to the interpretation that they have inherited theHD mutation.The relationship between age of onset and CCG length was also assessed. Age of onsetwas available for 57 individuals with an HD allele containing 7 CCG repeats(mean = 42.1 years, range 14- 65) and for the 9 individuals with an HD allele with 10CCG repeats (mean = 44. lyears, range 35 - 64). Comparison between these two groupsshowed no significant difference in the age of onset of the disease. Thus, despite evidencethat both the CAG and CCG are polymorphic on normal human chromosomes, it is onlythe CAG that has the susceptibility to significant expansion which in some unknown way isassociated with the phenotype of HD.Recently, Rubensztein et al.5 have reported similar findings of polymorphism of the CCGrepeat, identifying 4 alleles and emphasizing the importance of this repeat in determiningaccurate CAG repeat length in some FR) individuals.23011-4 REFERENCES1. Huntington Disease Collaborative Research Group (1993). A novel gene containing atrinucleotide repeat that is expanded and unstable on Huntington disease chromosomes.Cell 72:971-983.2. Goldberg YP, Andrew SE, Clarke LA, Hayden MR (1993). A PCR method foraccurate assessment of trinucleotide repeat expansion in Huntington disease. Hum MolGenet 2:635-636.3. Riess 0, Noerremoelle A, Soerensen SA, Epplen IT (1993). Improved PCR conditionsfor the stretch of (CAG)n repeats causing Huntington disease. Hum Mo! Genet 2:637.4. Valdes 3M, Tagle DA, Elmer LW, Collins FS (1993). A simple non-radioactive methodfor diagnosis of Huntington disease. Hum Mol Genet 2:633-634.5. Rubensztein DC, Barton DE, Davison BC, Ferguson-Smith MA (1993). Analysis ofthe huntington gene reveals a trinucleotide-length polymorphism in the region of the genethat contains two CCG-rich stretches and a correlation between decreased age of onset ofHuntington disease and CAG repeat number. Hum Mo! Genet 2: 1713-1715.231CHAPTER 12DISCUSSION23212.1 SUMMARY OF RESULTSThe identification of a CAG expanded repeat associated with HD ended a long search forthe HD mutation1.The work presented in this thesis represents refinement of the candidateregion by analysing the 6 Mb candidate region for markers demonstrating allelic associationwith the HD gene, discovery of an Alu retrotransposition event in the proximal candidateregion associated with HD in two families, as well as genetic analysis of the CAG repeatproviding further insights to the nature of dynamic mutations.12.1.1 ALLELIC ASSOCIATIONConflicting results from patients with recombinant chromosomes made the search for theHD gene an onerous task. In order to further reduce the candidate region for the HD gene,linkage disequilibrium was examined across the 6 Mb candidate region and two regions ofassociation, separated by 3 Mb were identified.Measures of linkage disequilibrium depend on accurate allele frequencies and selection ofcontrol chromosomes that accurately reflect the allele frequencies of the population fromwhich the HD alleles are sampled is an important consideration. The HD individuals andtheir unafected spouses analyzed in Chapter 3 were of mixed ancestry, primarily of UKdescent. In order to investigate more homogeneous populations, three homogeneouspopulations were analyzed in Chapter 4 for allelic association. No association was seenwith markers from both candidate regions. Haplotype analysis demonstrated severaldistinct haplotypes within each population, which could account for the inability to measureany allelic association. The existence of multiple haplotypes within each homogeneouspopulation suggests several origins for the HD chromosomes within each population, andthus multiple haplotypes underlie the worldwide occurence of HD.233The reason for allelic association observed with distal markers, 3 Mb from the CAGexpansion has yet to be resolved despite the identification of the expanded CAG repeatassociated with HD’. Allelic association is highly dependent on the allele frequencies ofthe disease and control populations, and it is possible that the results are a statistical artifact,due to controls not matched rigorously enough to the HD population. Multiple haplotypeswithin homogeneous populations identical with distal markers by chance may also explainthe disequilibrium detected between HI) and these distal markers. Another factor that couldhave influenced the measures of allelic association was the inadvertent inclusion of casesthat were later determined by to be misdiagnoses, sample mix-up or cases suffering fromother HD-like disorders. However, only one individual not demonstrating CAG expansionwas included in previous linkage disequilibrium analyses, and removal of results from thisindividual does not alter the significance of the measures of disequilibrium (data notshown).12.1.2 PATTERNS OF ALLELIC ASSOCIATION AROUND THE CAGREPEATThe CAG repeat associated with HD is located within a novel gene situated 120 kb from themarker D4S95’. Analysis of association with markers distributed over 200kb and flankingthe HD gene showed a pattern of increasing allelic association measures with respect togenomic distance from the CAG repeat. Haplotype analysis with these markers confirmedthat multiple haplotypes underlie HD. The major HD haplotypes have mean CAG lengthslarger than expected on normal chromosomes which is consistent with the hypothesis thatthe length of the repeat is associated with instability, and these chromosomes with largerange of normal CAG length are prone to expansion leading to HD chromosomes.23412.2.3 GENOMIC REARRANGEMENT ASSOCIATED WITH HDPrior to the identification of the HD gene a method of identifying transcribed sequences,developed by Dr. Rommens and termed “Gene Tracking”, identified 53 transcribed clonesfrom the proximal candidate region. One of the Gene Tracked clones, located close to themarker D4S95 demonstrating strong linkage disequilibrium with HD, identified a genomicrearrangement cosegregating with HD in 2 families. The rearrangement was mapped,cloned and sequenced and identified as an Alu element retrotransposition event.After identification of the expanded CAG repeat associated with HD by the HuntingtonDisease Research Collaborative Group the Alu insertion was localized 190kb from the siteof the CAG repeat. The affected individuals with the Alu insertion were shown to haveexpanded CAG repeat lengths similar to those seen in other HD patients. Whether this Aluinsertion event is a factor in the instability of the CAG repeat, or alternatively, whether theinsertion is an effect of instability of the chromosome as a result of the CAG expansion, orwhether the two events were independent remains unknown. Alu elements are known topromote recombination, suggesting that the insertion event described in this study couldhave triggered the expansion of the CAG repeat. However, both the size of the CAGrepeat and the presentation of the disease in the families with the Alu insertion is nodifferent from that seen in other Huntington disease patients, suggesting that most likely,expansion in the two families with the Alu insertion is due to the same mechanism as inother Huntington disease families.12.2.4 CAG REPEAT ANALYSISA PCR assay was established that allowed for rapid, reliable analysis of the length of theCAG repeat in HD patients and their families. Analysis of the CAG repeat demonstrated asignificant relationship between CAG repeat length and age of onset of disease, with CAGlength responsible for approximately 50% of the variation in the age of onset. CAG repeat235lengths were examined in patients from 43 different countries and 5 races, and was seen tounderlie HD worldwide. Thirty individuals did not have CAG expansion and representeither genetic heterogeneity of HD or errors of assignment such as misdiagnosis, samplemix-up or clerical error.Inclusion of alleles from these cases of misdiagnoses, sample mix-up, clerical error, andunexplained HD-like phenotypes in allelic association analyses was not a factor affectingthe search for the HD gene using measures of disequilibrium. However, CAG analysis ofthe families with informative recombination events that previously pointed to conflictinglocations for the HD gene allowed for the resolution of the discrepancies in the earlier data.For example, two of the recombinant chromosomes that suggested a distal location for theHD gene were from individuals lacking CAG expansion (Chapter 9, patients 5 and 6). Thereason for the HD-like phenotype in these patients has not yet been determined and theirdiagnosis is now in question.New mutations were determined to be derived from an intermediate sized “premutation”allele in an unaffected father. The allele expands in the sporadic case to the rangeassociated with HD. A sporadic patient with a recombinant chromosome (Chapter 10,Family 1) that triggered the hypothesis that the recombination breakpoint would identify thegene, demonstrates a CAG size in the expected affected range. The affected allele hasexpanded in size from an intermediate sized “premutation” allele in the unaffected parent,similar to that observed in other sporadic cases, to an allele within the HD range. Thissuggests that the recombination event was not causative of expansion in this patient.12.3 FURTHER INVESTIGATIONSThe identification of the CAG trinucleotide repeat associated with Huntington disease hasended the search for the Huntington disease mutation, with future goals being to236understand the mechanism of expansion and how the expansion seen in all tissues causessuch specific neuronal death, resulting in disease.The CAG repeat is located in the 5’ coding region of a large gene coding for a protein withno known function. Using the criteria established by Muller to categorize classes of mutantDrosophila alleles2,the complete dominance of Huntington disease most likely explainedby classifying the Huntington disease mutation as a neomorph. Neomorphic alleles areoften called gain of function mutations, as they result in an altered gene product that isfunctionally different from that of normal, or make the normal product that is produced atthe wrong place or time during development, due to error in regulation. How expansion ofthe CAG repeat alters the function of the HD protein remains to be determined.It is not yet known what initiates expansion of the CAG repeat. Linkage disequilibriumand haplotype analysis suggests that a few specific haplotypes are prone to furtherexpansion resulting in Huntington chromosomes. Identification of markers more closelyflanking the CAG repeat will provide more accurate haplotype information and will confirmif a multi-step process as seen in FRAXA and DM is occurring in Huntington disease3’4.23712.4 REFERENCES1. Huntington Disease Collaborative Research Group (1993). A novel gene containing atrinucleotide repeat that is expanded and unstable on Huntington disease chromosomes.Cell 72:971-983.2. Muller HJ (1932). Further studies on the nature and causes of gene mutations. InProceedings of the 6th International Congress Genetics you, pp2l3-255. Ithica, NewYork.3. Richards RI Holman K, Friend K, Kremer E, Hillen D, Staples A, Brown WT,Goonewardena P. Tarleton J, Schwartz C, Sutherland GR (1992). Evidence of founderchromosomes in fragile X syndrome. Nature Genet 1:257-260.4. Imbert G, Kretz C, Johnson K, Mandel JL (1993). Origin of the expansion mutation inmyotonic dystrophy. Nature Genet 4:72-76.

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0088862/manifest

Comment

Related Items