UBC Faculty Research and Publications

Comprehensive whole genome sequence analyses yields novel genetic and structural insights for Intellectual… Zahir, Farah R; Mwenifumbo, Jill C; Chun, Hye-Jung E; Lim, Emilia L; Van Karnebeek, Clara D M; Couse, Madeline; Mungall, Karen L; Lee, Leora; Makela, Nancy; Armstrong, Linlea; Boerkoel, Cornelius F; Langlois, Sylvie L; McGillivray, Barbara M; Jones, Steven J M; Friedman, Jan M; Marra, Marco A May 24, 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-12864_2017_Article_3671.pdf [ 3.58MB ]
JSON: 52383-1.0361968.json
JSON-LD: 52383-1.0361968-ld.json
RDF/XML (Pretty): 52383-1.0361968-rdf.xml
RDF/JSON: 52383-1.0361968-rdf.json
Turtle: 52383-1.0361968-turtle.txt
N-Triples: 52383-1.0361968-rdf-ntriples.txt
Original Record: 52383-1.0361968-source.json
Full Text

Full Text

RESEARCH ARTICLE Open AccessComprehensive whole genome sequenceanalyses yields novel genetic and structuralinsights for Intellectual DisabilityFarah R. Zahir1,2,5*, Jill C. Mwenifumbo1, Hye-Jung E. Chun1, Emilia L. Lim1, Clara D. M. Van Karnebeek3,Madeline Couse2, Karen L. Mungall1, Leora Lee2, Nancy Makela2, Linlea Armstrong4, Cornelius F. Boerkoel4,Sylvie L. Langlois4, Barbara M. McGillivray4, Steven J. M. Jones1, Jan M. Friedman2† and Marco A. Marra1,2†AbstractBackground: Intellectual Disability (ID) is among the most common global disorders, yet etiology is unknown in~30% of patients despite clinical assessment. Whole genome sequencing (WGS) is able to interrogate the entiregenome, providing potential to diagnose idiopathic patients.Methods: We conducted WGS on eight children with idiopathic ID and brain structural defects, and their normalparents; carrying out an extensive data analyses, using standard and discovery approaches.Results: We verified de novo pathogenic single nucleotide variants (SNV) in ARID1B c.1595delG and PHF6 c.820C > T,potentially causative de novo two base indels in SQSTM1 c.115_116delinsTA and UPF1 c.1576_1577delinsA, and de novoSNVs in CACNB3 c.1289G > A, and SPRY4 c.508 T > A, of uncertain significance. We report results from a large secondarycontrol study of 2081 exomes probing the pathogenicity of the above genes. We analyzed structural variationby four different algorithms including de novo genome assembly. We confirmed a likely contributory 165 kbde novo heterozygous 1q43 microdeletion missed by clinical microarray. The de novo assembly resulted inunmasking hidden genome instability that was missed by standard re-alignment based algorithms. We alsointerrogated regulatory sequence variation for known and hypothesized ID genes and present usefulstrategies for WGS data analyses for non-coding variation.Conclusion: This study provides an extensive analysis of WGS in the context of ID, providing genetic andstructural insights into ID and yielding diagnoses.Keywords: Intellectual Disability, Whole genome sequencing, ARID1B, PHF6, SPRY4, CACNB3, SQSTM1, UPF1,1q43 microdeletion, Genome assemblyBackgroundIntellectual Disability (ID) affects 1–3% of the globalpopulation. A significant proportion of ID is caused bygenetic defects, yet despite extensive testing including byclinical chromosomal microarray (CMA), ~30% of casesremain idiopathic [1].Genome-wide sequencing can identify previously un-known genes causative for ID. Whole exome sequencing(WES) is limited by poor ability or inability to detectnon-coding and structural variation, and capturing lessthan 100% of the exome [2]. In contrast, whole genomesequencing (WGS) offers a comprehensive screen of avariety of DNA variation types. Current evidencesuggests WGS is able to detect coding variants in 42% ofcases missed by WES [2].We report comprehensive WGS analyses for eightpatients with ID and brain malformations, whose familyhistory suggested a de novo mutation. Despite a diagnosticodyssey, including genome-wide clinical and research* Correspondence: farahz@cfri.ca†Equal contributors1Canada’s Michael Smith Genome Sciences Center, Vancouver, BC V5Z 4S6,Canada2Department of Medical Genetics, University of British Columbia, Vancouver,BC V6T 1Z4, CanadaFull list of author information is available at the end of the article© The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, andreproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link tothe Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.Zahir et al. BMC Genomics  (2017) 18:403 DOI 10.1186/s12864-017-3671-0CMA, they were idiopathic. WGS was conducted on trioscomposed of the affected child and both unaffected par-ents (average 34X coverage), and data was analyzed usingboth alignment and assembly approaches to detect all pos-sible causative genetic changes- single nucleotide variants(SNVs and indels), copy number variants (CNVs) andstructural variants (SVs) (Fig. 1). We validated our findingsusing WES data from an independent positive controlcohort of 2081 patients with ID and other neurocognitivephenotypes, and a negative control WGS cohort of 2535normal subjects. In addition we probed molecular themesindicated by our discovery cohort findings in the positivecontrol cohort, leveraging its large size. We also con-ducted a screen for de novo variants in possible regulatorysequences of known and hypothesized pathogenic genes.MethodsSubjectsPatients were enrolled from the British Columbia Children’sand Women’s Hospital Provincial Medical GeneticsProgram after obtaining informed consent. This study isapproved by the British Columbia Children’s and Women’shospital research ethics boards. All patients presented withID (moderate to severe) and brain morphological defectsdetected by MRI or CT scan. Patients had no family historyof ID, and all were products of normal pregnancies with noreported teratogenic exposures as ascertained by clinicalassessment by board certified Medical Genetics specialistsat the recruiting facility. Saliva samples were collected andDNA extracted using DNA Genotek® collection kits,reagents and protocols from child, father and mother.MethodsWGS methods, variant calling protocols, verificationmethods, and secondary control study methods includingbootstrap analysis, are summarized below and detailed inAdditional file 1. Briefly; DNA was extracted using DNA-Genotek® extraction kits. Paired-end WGS libraries wereprepared using Illumina’s PCR-free protocol (TruSeqDNA Sample prep kit -Illumina Catalogue Number FC-121-1002). Sequencing was by IlluminaHiSeq 2500 plat-form (v3 chemistry) generating 100 bp paired-end reads,using three lanes per sample (34X average coverage acrossall samples). Alignment and variant calling was by Cana-da’s Michael Smith Genome Science Center standardpipelines (Additional file 1, reference genome - hg19).Variants were identified and filtered as follows, briefly;putative SNVs were identified using SAMtools mpileupFig. 1 Schematic of complete study design. Abbreviations: CNV = copy number variant; SV = structural variant; SNV = single nucleotide variant;DDD = Deciphering Developmental Disabilities study; UPP = Ubiquitin Proteolysis Pathway; IGV = Integrated Genome Viewer; DGV = Database ofGenome VariationZahir et al. BMC Genomics  (2017) 18:403 Page 2 of 16version 0.1.17 run on each sample separately. Relatednesswas tested for each trio by comparing SNP concordancebetween child, mother and father using vcftools-0.1.14 [3](Additional file 2: Table S1). De novo variants were selectedby intersecting the child’s SNVs with that of each parent,and selecting variants only present in the child and not ineither parent. For variants in the coding region, we selectedde novo missense, nonsense and splicing variants, i.e., func-tional variants. We next selected rare variants by excludingalleles with minor allele frequency >1% in dbSNPv135 (ex-cluding disease associated variants), Exome Variant Server,Exome Aggregation Consortium (ExAC), and in-house da-tabases of >7430 exomes, and >3000 genomes (at Canada’sMichael Smith Genome Sciences Center and the BritishColumbia Children’s Hospital Research Center, availablevia open-source access [4]). We then used pathway enrich-ment analyses to prioritize de novo rare variants; selectingSNVs in genes enriched in pathways involved in brain de-velopment and function conducted using QIAGENs In-genuity® Pathway Analysis (IPA), DAVID (https://david-d.ncifcrf.gov/ 6.7) and Panther (http://pantherdb.org/). Forthose variants passing the pathway enrichment screen,pathogenicity predictions and conservation scores were an-notated using SIFT [5], PhyloP [6], PolyPhen [7], Muta-tionTaster [8] and CADD [9] scores. These steps yielded denovo, functional, rare variants, that are highly conservedand predicted to be damaging and in biologically relevantpathways. In addition to the above prioritization, the rarefunctional variants were subsequently also screened undera series of additional genetic models (e.g. compound het-erozygous, de novo heterozygous, homozygous recessive,hemizygous recessive), and manually checked for align-ment quality with Integrated Genomic Viewer (IGV,https://www.broadinstitute.org/software/igv). SNVs thatwere highly conserved and were predicted to be damagingby at least one pathogenicity prediction software, were se-lected for verification by Sanger sequencing in the child,mother and father.CNV analyses was conducted using FREEC [10], CNA-seq [11], DELLY [12] and ABySS [13]. The first threealgorithms align reads to the reference genome whileABySS uses de novo assembly to reconstruct the patient’sgenome. SV analyses was conducted using only DELLYand ABySS. First, de novo CNVs/SVs were identified bycomparing the child’s data to that of either parent (Add-itional file 1). De novo CNVs from each algorithm were fil-tered by manual assessment of local read configuration onIGV, and genuine ones were prioritized based on func-tional relevance of the included/CNV-affected genes. SVs,i.e., translocations and inversions, were filtered by eitherIGV read visualization and then by using QC metrics spe-cific to each algorithm; QC metrics generated by the pro-gram were used for DELLY, and checking of BLAT scoresfor breakpoint-junction contigs and number of supportingreads were used for ABySS. Candidate CNVs/SVs thatwere selected from the above filtering were verified usingan independent method as detailed below.All de novo variants, i.e., SNVs, CNVs and SVs, wereverified by Sanger sequencing of PCR-captured ampliconsof the affected sequence, either bearing the SNV orspanning the breakpoint junction (in the case of CNVsand SVs) in the trio, with forward and reverse primers(Additional file 3: Table S2). All verified candidate SNVswere subjected to genotype-phenotype correlations assess-ment as per the guidelines of the American College ofMedical Genetics (ACMG) [14].Secondary control study - WES data from the UK10Kproject [15] for 2081 patients with neurofunctional pheno-types (available clinical data for the projects that comprisethis cohort is found in Additional file 4: Table S3), andWGS data from 2535 normal individuals from the 1000Genomes project [16]; a publicly available repository ofvariation in healthy individuals, was obtained. ‘Possiblydamaging SNVs’ (PDSs), were extracted from these data-sets (as detailed in Additional file 1 and Additional file 5:Figure S2), and a gene-wise PDS burden for all genes inthe human genome was determined in both the positiveand negative control cohorts. Subsequently the gene-wisePDS burden only in our candidate genes was comparedbetween the positive and negative control cohorts. Wefurther bootstrapped the positive control cohort to deter-mine if the PDS burden in our six candidate genes couldbe due to random sampling. Finally, we tested to see whatfunctional pathways genes with PDS in the positive con-trol cohort were involved in, and conducted a KyotoEncyclopedia of Genes and Genomes (KEGG [17]) path-way enrichment analyses, testing which of the total 57functional pathways from KEGG were most enriched forgenes bearing PDS in this large dataset.Regulatory region variation – For our regulatory regionanalysis we selected ‘high confidence’ de novo SNVsdefined as having a mapping quality > 30 and read depth ≥10 and ≤ 100, and ‘high confidence’ de novo CNVs definedas those that were detected by two or more CNV detec-tion algorithms. We then intersected both the de novohigh confidence SNVs and CNVs with six non-coding se-quence annotation datasets. Results from the above, i.e.,de novo high confidence SNVs and CNVs with involve-ment in putative regulatory regions, were then intersectedwith candidate gene lists and appropriate flankingsequence (Additional file 1) to determine their possible as-sociation to a candidate known or hypothesized ID gene.ResultsDe novo SNVs identified by objective molecularpathway-based filtrationGenes with functional rare de novo SNVs were screenedusing three pathway analyses programs (IPA, DAVIDZahir et al. BMC Genomics  (2017) 18:403 Page 3 of 16and Panther) in order to refine candidates involved inbrain development and function; IPA returned 17 candi-date genes, DAVID returned 23, and Panther returned 9.A total of 23 unique genes involved in brain developmentand function were yielded by the combined analyses (i.e.,found by at least one of the programs). From these, highlyconserved and predicted damaging SNVs (11 SNVs in 11genes in six patients) were Sanger tested, and six SNVs insix genes in five children were confirmed as heterozygousde novo (Table 1); ARID1B [MIM:614556] NM_017519:c.1595delG (p.G532fs), PHF6[MIM:300414] NM_001015877:c.820C > T (p.R274X), SPRY4 [MIM:607984]NM_001127496:c.508 T >A (p.C170S), CACNB3[MIM:601958] NM_0012069:c.1289G >A (p.R430Q), SQSTM1[MIM:601530] NM_03900: c.115_116delinsTA (p.A39fr*#)and UPF1 [MIM:601430] NM_002911:c.1576_1577delin-sAA (p.A526N). The latter two were found in a single pa-tient while the rest each appeared in a separate patient. Asbest practice, we also screened our de novo rare functionalvariants for location within published known [2] and can-didate ID genes [18], however no new findings wereyielded. Except ARID1B and PHF6, the other genes arenovel for ID. Table 1 provides variant classification as perthe ACMG variant interpretation guidelines [14](Additional file 6: Table S4 for detailed classificationof variants) and our interpretation of their causative effect.Brief genotype-phenotype correlations are given below;ARID1B c.1595delG (p.G532fs) in Patient 43This single base deletion in exon 2 of the known IDgene ARID1B causes a frame-shift leading to predictedloss of function (LoF, Additional file 5: Figure S1). Ourpatient presents with ID, autism, absence of corpus cal-losum, absence of speech, feeding difficulties and failureto thrive (Table 1). Haploinsufficiency of ARID1B wasreported to cause corpus callosum abnormalities, ID,speech impairment and autism [19], suggesting theARID1B LoF is causative and sufficient in this case.PHF6 c.820C > T (p.R274*) in Patient 58PHF6 encodes the plant homeodomain finger protein 6.The nonsense variant in PHF6 is located in the ePHD2domain in which causative de novo truncating andmissense variants for Börjeson-Forssman-Lehmannsyndrome (BFLS) [MIM:301900] [20], and Coffin-Sirissyndrome (CSS) [MIM:135900] [21] are known. De novotruncating and other mutations in PHF6 are reported tocause a distinct syndrome in girls [22] and reported fora female specific form of BFLS [23]. Roles for PHF6 arereported in the chromatin remodeling SWI/SNFcomplex [24], and in the NuRD epigenetic regulatorycomplex where it acts as a possible regulator for the lat-ter in neurogenesis [25]. RNAi knock down of PHF6profoundly impairs neuronal migration in vivo [26], thusleading to formation of white matter heterotopias. Inkeeping with this, this patient reports pachygyria, whichresults from abnormal migration of neurons in thedeveloping brain. She also presents with an unusualasymmetrical growth phenotype that was reported in theone patient with the female specific BFLS [23]. Thesedata indicate the variant is a good candidate in this case.SPRY4 c.508 T > A (p.C170S) in Patient 59SPRY4 encodes a specific inhibitor of the mitogen-activatedprotein kinase family. Spry4 is expressed in the mousedeveloping brain [27], and is essential for the normal mor-phogenesis and cytoarchitecture of the cerebellum [28].Morphogenic changes in axon growth have been shownwhen the protein is down regulated both in vivo and invitro [29]. In zebrafish, spry4 is a principal regulator ofmid-brain development [30], and mediates hindbrain pat-terning [31]. These data support the notion that the SPRY4missense variant may contribute to the brain morphologicalphenotype in this patient. Spry4 expression plays a role inXenopus limb bud development [32], of note as our patientreports short and crowded toes.CACNB3 c.1289G > A (p.R430Q) in Patient 45CACNB3 encodes a regulatory subunit of a voltage-dependent calcium channel (VDCC). Mice lacking Cacnb3presented visual impairment [33], high pain threshold [34],and behavioral phenotypes [34], all of which features areseen in this patient. Mutations in other members of VDCCsubunit encoding genes are known to cause neurologicaldisease, including epilepsy [35] present in our patient. Thisvariant is found in eight of 60,165 individuals in the ExACdatabase, where its non-absence disqualifies likely patho-genicity as per ACMG criteria, despite being de novo anddeleterious by multiple lines of computational evidence.Neither does it meet criteria to be a benign variant, andtherefore is of uncertain significance.SQSTM1c.115_116delinsTA (p.A39*), UPF1 c.1576_1577delinsAA(p.A526N) and a 1q43(1:243282457–243447771, hg19) deletionCNV in Patient 51The patient is severely affected, with significant ID andseveral major congenital anomalies (Table 1). The het-erozygous indel formed by two adjacent SNVs inSQSTM1 causes a stop-gain. SQSTM1 encodes p62, aregulatory factor in Nuclear Factor kappa-B (NF-kB) sig-naling, NF-E2-related factor 2 (NRF2) activation,ubiquitin-mediated authophagy, and transcription [36].The SNV is located in the PB1domain, mutations ofwhich cause Paget Disease of Bone (PDB) and Fronto-temporal Dementia and/or Amyotrophic Lateral Scler-osis (FTLD/ALS) [MIM:607485,612069] [36]; bothneurodegenerative conditions that include morpho-logical brain changes. The adjacent SNVs in UPF1,Zahir et al. BMC Genomics  (2017) 18:403 Page 4 of 16Table1PatientphenotypeandvariantsummaryPatient#ApproxAgeatExaminationaPhenotypeGeneExonicfunctionAAChangeChrCo-ordinate(Hg19)OtherinfoACMGclassificationsystemCommentOtherreportsofsamevariant43Lessthan5yearsoldFeedingproblemsandfailuretothrive,globaldevelopmentaldelay,Autism.Height25%ile,weight-3SD,OFC2-10%ile.CT/MRI-Dysgenesisofthecorpuscallosum.ARID1BFrame-shiftsinglebasedeletionNM_017519:c.1595delG:p.G532fs6157150413hetPVS1,PS2,PM2=PathogenicSufficienttocausedisease58Lessthan5yearsoldDevelopmentaldelay.Subtlegrowthdifferenceinvolvingwholeleftside.Height75%ile,weight25%ile,OFC66th%ile.MRI-hemimegancephalyandhypertrophyononeside.Milddilationoflateralventricles,mildlysmallerlefthemispherewithsuggestionofpachygyrianearanteriortemplelobes.PHF6StopgainSNVNM_001015877:c.C820T:p.R274aX133549136hetPVS1,PS2,PM2,PP3=pathogenicSufficienttocausediseaseCOSM144567,COSM113462959Between10and15yearsoldModeratedevelopmentaldelay,facialdysmporphisms,seizurereportedat12years.Enlargedlabia.Self-abusivewhenangry.Height<25%ile,weightbetween50thand75th%ile.OFC25th%ile.CT-mildventriculomegaly.SPRY4NonsynonymousSNVNM_001127496:c.T508A:p.C170S5141694166hetPS2,PM2,PP3=LikelypathogenicPossiblycontributorytobrainphenotypeAP4E1NonsynonymousSNVNM_001252127.1:c.T3140C:p.L1047PNM_007347.4:c.T3365C:p.L1122P1551294810hetN/AN/AAP4E1Splice-donorSNVNM_001252127.1:c.121+2T>CNM_007347.4:c.346+2T>C1551207770hetN/AN/A45Between10and15yearsoldDevelopmentaldelayandvisualinattentivenessnotedat3months.Athetoidmovementswithdystonicposturingpresentby15monthsandseizuresnotedby2yearsofage.Atagefour,adiagnosisofautismwassuspectedbutcouldnotbeconfirmedgiventheseveretoprofoundID.MRI:thincorpuscallosum,increasedventricleandsubarachnoidspacesize.CACNB3NonsynonymousSNVNM_001206915:c.G1289A:p.R430Q1249221639hetPS2,PP3=UncertainsignificanceMayplayaroleinthebrainmorphologicalphenotypeSCN3ANonsynonymousSNVNM_006922.3:c.T626C:p.L209P2166020196hetN/ASelectedascandidateforepilepsyphenotype.Functionalstudiesunderway51bBetween15and20yearsoldSignificantintellectualdisability.Grossmotordelay.Seizuring.Scoliosis.Somehearingdeficiency.Astigmatismandfar-SQSTM1Twobaseindel,causingastop-gainNM_003900:c.115_116delinsTA:p.A39a5179248051-179248052hetPS2,PM2,PP3=LikelypathogenicUnsureofrelativecontributionofthisvariantZahir et al. BMC Genomics  (2017) 18:403 Page 5 of 16Table1Patientphenotypeandvariantsummary(Continued)sightedness.Remarkablefamilyhistory.Pregnancycomplicatedbypossibleoligohydramnios.Suctionedformeconiumandphysicallystimulated.Placentawascalcified.MRI-asymmetricallateralventricles.versusothersinthesamechildUPF1Twobaseindelcausingamis-sensemutationNM_002911:c.1576_1577delinsAA:p.A526N1918966765-18966766hetPS2,PM2,PP3=LikelypathogenicUnsureofrelativecontributionofthisvariantversusothersinthesamechild42Lessthan5yearsold.Recurrentaspiration.Opticnervedysfunctiondetectedbyabsenceoflightreflex.Height,weightandOFCallat25%ile.CT-absenceofcorpuscallosum.LRP2NonsynonymousSNVNM_004525.2:c.G4351T:p.V1451F2170094756hetN/AN/ALRP2NonsynonymousSNVNM_004525.2:c.A12725G:p.D4242G2170003335hetN/AN/A41CT-cerebellaratrophy55CT-milddilationofthelateralventriclesAbbreviations:IDIntellectualDisability,OFCoccipito-frontalcircumference,CTcomputerizedtomographyscan,MRImagneticresonanceimagingscan,PVS1nullvariantinagenewhereLoFisaknownmechanismofdisease,PS2denovoinapatientwiththediseaseandnofamilyhistory,PM2absentfromcontrolsinexomesequencingproject,1000genomesprojectorexomeaggregationconsortium,PP3multiplelinesofcompu-tationalevidencesupportadeleteriouseffectonthegeneorgeneproducta Ageatexaminationisgivenin5yearintervalsinordertoprotectpatientanonymitybPatient51alsobearsadenovolikelycontributoryCNVasdetailedinthetext,inadditiontotheSNVsgivenhereZahir et al. BMC Genomics  (2017) 18:403 Page 6 of 16together cause a likely pathogenic missense amino acidchange (Table 1 and Additional file 5: Figure S1). UPF1has an essential role in nonsense-mediated mRNAdecay [37]. Interestingly, UPF1 has been shown to re-markably reduce ALS-associated neuronal toxicity invitro [38] and to protect against motor dysfunctionand forelimb paralysis in a rat model for ALS [39]. Itis plausible haploinsufficiency of SQSTM1 may havecaused neurofunctional defects, which the haploinsuf-ficiency of UPF1 may have exacerbated. In this re-gard, it is notable that at 19 years of age, patient 51presents significant motor deficits, being wheelchairbound, indicative of a possible early onset of ALS.While scoliosis and hearing loss, both among thepresentation for PDS is already seen in her. Thesedata support the notion that the SNVs in both genesmaybe contributory toward her presentation.We further verified a de novo ~165 kb heterozygousdeletion that spans CEP170 [MIM:613023] in wholeand SDCCAG8 [MIM:613524] in part (Fig. 2a and c)in this patient. CEP170 encodes a component of thecentrosome [40]. SDCCAG8 is also involved incentrosome function [41], DNA damage responsesignaling [42] and neuronal migration [41]. Bothgenes are suggested as candidates for corpus callosumabnormalities via 1q43 microdeletion [43], howeverthis has been contested [44] (Fig. 2c). Our patientpresents partial phenotypic overlap with microdeletion1q34 index cases. The demonstrated roles forSDCCAG8 in DNA-mismatch repair, and for bothgenes in cell cycle progression, supports the notionthis CNV may be contributory. Notably, the haploin-sufficiency of a DNA-mismatch repair gene could leadto the high mutation burden detected in this child(above SNVs, and vide section ‘Genome AssemblyIndels’). We also confirmed at least one maternallyinherited balanced translocation (vide section onCNV/SVs), which is unlikely to be contributory.Fig. 2 Details of CNV analyses. a IGV images for heterozygous deletion CNV in patient 51, showing proximal and distal breakpoint. The CNV involveswhole of CEP170 and part of SDCAAG8 genes. Top, middle and bottom panels are child’s .bam file, mother’s .bam file and father .bam file respectively.Read-depth coverage shows CNV is de novo (red ovals). b Cartoon of breakpoint junction seuqence showing a 24 bp chromosome 16 (green box) and107 bp chromosome 5 sequence (yellow box) inserted between the proximal and distal breakpoints on chromosome 1q43. Yellow shaded segmentshows sequnce microhomology- this 14 bp seuqence (TTGGGAGTAGAGGG) is found at chromosome 5:40,069,598-40,069,612 and at chromosome1:243,447,747-243,447,761, hg19). Sanger sequence trace images are overlaid confirming the CNV breakpoint. Grey arrows denote PCR forward andreverse primers. N denotes DNA repeat sequence. c Genomic interval involved in the de novo CNV detected in patient 51- ucsc genome browser(hg19). Red highlighted box shows region involved in the deletion in our patient. Yellow boxes show critical region for 1q43-44 sydrome defined byNagamani et al. Green box shows critical region as defined by Perlman et al. N.B, Nagamani et al. also highlight ZBTB18 (old name ZNG238) in theircritical regionZahir et al. BMC Genomics  (2017) 18:403 Page 7 of 16Mutation burden assessment in large secondary positive andnegative control cohorts support candidacy of novel genesWe investigated the candidacy of the above verified genesby assessment for incidence of damaging mutation in largepositive and negative cohorts with comparable NGSdata. We looked for ‘potentially damaging SNVs’(PDSs) (Additional file 5: Figure S2 gives an exampleper patient PDS mutation burden) in our candidategenes, from WES of 2081 patients with neurodevelop-mental and neurocognitive phenotypes from theUK10K cohort [15] (Additional file 4: Table S3 andAdditional file 5: Figure S3) and compared that to in-cidence in WGS from 2535 healthy people from the1000 Genomes project [16].We first screened for the exact variant detected in ourdiscovery cohort, and did not find any case of an exactmatch. We then conducted a gene-wise PDS screen andobserved that incidence for PDS in ARID1B, SPRY4,CACNB3, SQSTM1 and UPF1 were significantly enrichedin the positive versus negative control cohorts (Fig. 3a).There was no significance for PHF6; however, the twoPDS found in 4616 people was insufficient for meaningfulstatistical assessment. The extremely high PDS burden inthe positive control cohort for SQSTM1 and UPF1 is note-worthy, as these genes have previously not been reportedfor ID to our knowledge, and further, the indels in bothare found in the same patient in our cohort.While we do not have access to clinical data to con-duct a classical genotype-phenotype correlation betweencases in the positive control cohort and our patientswho have the same gene affected, the large number ofsuch cases in the positive control cohort also impedessuch a study within the scope of this work. We thereforeassessed if our findings could be due to random chanceeffect, by bootstrapping the UK10K cohort for PDS insix randomly selected genes each, a thousand times. Wefound from the bootstrap analysis that the mean andmedian gene-wise variant frequency for our six candi-date genes was greater than that of the correspondingdistribution, indicating that our findings were not likelydue to chance (Fig. 3b & c). These data are consistentwith an association of at least five of our candidate geneswith neurodevelopmental abnormalities.Novel candidate genes converge unto the ubiquitinproteasome pathway, which also bears significantmutation burden in 2081 positive control WES samplesWe investigated molecular links between our pathogenicand candidate genes; focused IPA and STRING pathwayanalyses revealed that all six connected to the ubiquitina bcFig. 3 Validation study. a showing incidence for potentially damaging SNVs (PDSs) in both the positive control (UK10K) and negative control(1000G) control cohorts. * denotes statistical signficance (at p < 0.05, Fisher’s exact test) b and c Results of bootstrap analyses for PDSs in 6randomly selected genes. Red vertical bar shows the mean and median result for PDSs in our 6 candidate genesZahir et al. BMC Genomics  (2017) 18:403 Page 8 of 16proteasome degradation pathway (UPP) (Additional file 5:Figure S4) which has important roles in the structural de-velopment and function of the brain [45, 46]. We assessedthe relative importance of this pathway and found theUPP was among significantly enriched pathways for PDSwhen compared with all KEGG pathway categories (n =55) (Additional file 7: Table S5), in the UK10K patient co-hort (p = 0.031), substantiating the importance of the UPPpathway in brain development.Mendelian inheritance and N of 1 analyses providesadditional candidate variantsIn addition to our in-silico refinement and test for candi-date de novo SNVs above, we also conducted a classicalseries of N of 1 studies for these eight patients; manuallyassessing the possible candidacy of variants selected by allpossible Mendelian inheritance patterns (Additional file 8:Table S6). Compound heterozygous missense mutationswere identified in LRP2 [MIM:600073], causative of theautosomal recessive Donnai Barrow syndrome [MIM:222448] in patient 42. Absence of the corpus callosum,reported in our patient, presents in Donnai Barrow syn-drome. Compound heterozygous mutations were identifiedin AP4E1 [MIM:607244] causative of autosomal recessivespastic paraplegia type 51, in patient 59. This patientreported a seizure at 12 years of age, however does notexhibit the severe neurological phenotypes nor the shydemeanour reported for a possible syndromic form of ID[47, 48] caused by defects in adaptor protein complex-4, ofwhich AP4E1 encodes one of the four subunits. A missenseSCN3A [MIM:182391] SNV (p.Leu209Pro/c.626 T >C) inpatient 45 was selected due to SCN3A association to epi-lepsy [49] (a phenotype presented by our patient), and thepredicted deleterious effect of the variant, and was Sangerverified as de novo. Functional studies are underway tofurther investigate the role of SCN3A in epilepsy.Extensive copy number variant (CNV) and structuralvariant (SV) analyses identifies likely causative CNVmissed by clinical CMA, and balanced benigntranslocationWe conducted both alignment-based (FREEC, CNAseq,DELLY) and de novo assembly-based (ABySS) CNV/SVanalyses. CNVs, i.e., duplications (gains) and deletions(losses) were identified by all four platforms, while SVs,i.e. translocations and inversions, were identified byDELLY and ABySS (Table 2). An average of 58 de novogain CNVs and 128 de novo loss CNVs across all eightpatients were detected. However, only 46 CNVs werecalled by over one platform, and none were called bymore than two (Fig. 4), with the majority of eachalgorithm’s findings being unique. We carried out exten-sive visual in silico curation for all CNVs, and selectedthree to verify of which, only the previously discussed1q43 loss CNV, Sanger verified as de novo- it was de-tected by FREEC and CNAseq, and is clearly visible onIGV (Fig. 2a). Breakpoint junction sequence reveals acomplex architecture (Fig. 2b).Similar to our CNV results, SV results from DELLY andABySS were divergent (Table 2). Only one translocation(between chromosome 19 and 1) in patient 41, was calledby both, and there was no concordance among inversions.Upon extensive manual in silico curation we selected 10translocations and 1 inversion to verify (Additional file 9:Table S7), but none verified as de novo. Sanger verificationfor these lesions was challenging as breakpoints mappedto repeat-masked regions, nevertheless one translocationverified as maternally inherited; a chromosome X-2(92696685:225020555, hg19) translocation not causingany gene-disruption, in patient 51. The breakpoint junc-tion shows a single base addition (Fig. 6a).Genome assembly yields small insertions/deletions(indels) missed by genome re-alignmentIn contrast to the re-alignment based algorithms, ABySS[13] identified over 700 potential de novo indels (max-imum size 100 bp), via genome assembly. Forty threeindels were refined as likely true positives with a func-tional importance, due to having at least seven spanningreads, and producing a protein coding change; the major-ity being in patient 51. For consistency, we conducted apathway analyses for the indel-bearing genes, and a man-ual curation, in order to select candidates for verificationas we had done for our SNVs. This resulted in 14 indelsthat were Sanger tested (Additional file 1); however onewas false positive, five were inherited, and eight did notpass PCR quality checks (Additional file 9: Table S7),indicating location to repeated DNA sequence, thushampering any ability to amplify the region forSanger sequencing.Gene regulatory region variation identified in known andhypothesized ID genesWe investigated gene regulatory sequence variationwhich we term ‘de novo variants in possible regulatoryregions’ (DVPRRs). We filtered the DVPRR for poten-tially pathogenic changes using two approaches: byscreening for involvement in known ID genes, and onthe basis of our hypothesized involvement of the UPP.An average ~30,000 de novo SNVs were foundacross our eight patients in the non-coding genome(Fig. 5a). Of these, an average 2909 located to tran-scription factor binding sites, an average 514 to puta-tive gene promoters, an average 191 of those locatedto promoters were also located to transcription factorbinding site regions, an average 210 located to regionsannotated as enhancers by the FANTOM consortium[50], an average 263 belonged to 5′ or 3′ UTRZahir et al. BMC Genomics  (2017) 18:403 Page 9 of 16regions and an average 58 located to highly conservedultra-sensitive regions [51] – we considered these tobe DVPRR and therefore there were an average 3763DVPRR across all eight patients (Fig. 5a). We thenintersected DVPRRs with 995 genes known to causedevelopmental delay (‘DDD genes’) [52] in a diseasegene screen approach, and with the total 137 genes ofthe UPP (KEGG), − as our candidate genes convergedupon the UPP - in a hypothesis-driven approach. Asa final step for enhancers and ultra-sensitive regions,we further selected DVPRR where it, and the candi-date gene (DDD genes or UPP genes), were locatedwithin the same topological domain [53], postulatingthat their physical proximity would imply that theregulatory region in question did in fact impact thetargeted gene. In summary we found an average of 56and 11 DVPRR per patient in our gene-screen andhypothesis driven approach respectively, by thesefiltrations combined (Additional file 10: Table S8)(Fig. 5a). We also interrogated high-confidence CNVsin the same manner, but only found association toSDCCAG8, a known ID gene present in thepreviously discussed 1q43 microdeletion (Fig. 5b andAdditional file 10: Table S8).Occurrence of de novo SNVs in non-coding RNAs (ncRNA)We found an average of 241 high confidence de novoSNVs that located to sequence annotated as ncRNAacross all eight patients. A majority of these (average195) fall within introns while an average 39 are ex-onic, an average 0.25 are predicted in splice junctionsequence and average 5 and 2 are located to 3′ and5′ UTR respectively.Table 2 Number of copy number variants and structural variants identifiedPatient#FREEC CNAseq DELLY ABySSGains Losses Gains Losses Gains Losses Inv Trans Gains Losses Inv Trans42 10 29 8 11 0 0 0 1 4 11 1 855 8 22 17 7 0 0 0 2 0 5 1 258 8 18 5 8 0 2 0 1 2 10 2 141 14 13 13 16 10 60 16 19 4 23 0 459 8 18 7 10 0 2 0 2 4 17 2 1143 14 18 5 5 0 3 0 0 16 25 1 551 6 19 13 11 0 0 0 0 30 113 5 2145 9 12 11 9 1 1 0 1 5 14 2 7Totals 77 149 79 77 11 68 16 26 65 218 14 59Fig. 4 Venn driagram showing CNVs found by each algorithm (G = gain, L = loss)Zahir et al. BMC Genomics  (2017) 18:403 Page 10 of 16DiscussionSelection of candidate SNVs: comparison of strategiesAn effective strategy is essential to select causativeSNVs from NGS data. Standard filtration approaches(e.g., variant quality, mapping quality, minimum readdepth, and functional variants that are not commonpolymorphisms) yield potential de novo variants thatthen must be careful sifted for likely true candidates.In keeping with others [54], we found an average of 6+/− 2 candidate unverified de novo SNVs (Additionalfile 8: Table S6), and it was necessary to implementan effective prioritization approach for verification.Discovery WGS and WES studies published to datehave used a large sample size [2], detailed pedigreeinformation [55], or well characterized rare syndromes[56] as study cohorts, leveraging the power ofnumbers, inheritance pattern, and phenotypic com-monality, respectively, as filtration strategies. In asmuch as we did not have a large cohort, all of ourcases were sporadic, and none had a recognizeddysmorphic syndrome, we refined SNVs objectively,by selecting genes known to be involved in braindevelopment pathways. We reasoned that this system-atic approach would reduce subjective bias inherentin an N of 1 genotype-phenotype correlation, andthereby identified potential candidates. However asubjective screen for SNVs yielded the likely dam-aging variant in SCN3A, which was not stratified byour objective approach –highlighting the limitation ofpathway analyses programs that depend on availableabFig. 5 Schematic of filtration pipeline for variants in non-coding regions. a Schematic for SNVs. b Schematic for CNVs. Abbeviations; SNV- singlenucleotide variant, TFBS – Transcription Factor Binding Site, FANTOM-Enhancer sequence as annotated by the Fantom consortium. UTR – untranslatedregions. DDD- Deciphering Development Disabilities. UPP – Ubiquitin proteosome degredation pathway. CN- copy number. Patient 42 had DVPRR inthe UTRs of two genes; CBL and UBE3B. Patient 59 had a DVPRR in the promoter of UBE3A, patient 43 had a DVPRR in the promoter of CUL4B, andpatient 42 had DVPRRs in the promoters of UBE3A, CUL4B and CUL7 (Additional file 10: Table S8)Zahir et al. BMC Genomics  (2017) 18:403 Page 11 of 16gene-functional annotations. Notably SCN3A was se-lected by a team of biochemical geneticists specificallywith respect to the epilepsy presented by the child.Thus a subjective approach may also miss results de-tected from objective screening, as exemplified in thiscase, where the two analyses were done by independ-ent members and each did not report the result ofthe other.Interpreting detected variants; discovery study findingsfurther inform genetic complexity for IDVariable expressivity and reduced penetrance are wellknown in the pathogenicity of ID, and it is increasinglyrecognized that a single mutation in a single gene mayonly rarely explain the full phenotypic spectrum [1]. Ourresults provide further indications of such complex her-itability; in patient 51, the 1q43 deletion, and SNVs inSQSTM1 and UPF1 may act in concert to produce thecomplex and severe phenotype in this patient. While inpatient 58, we have identified both compound heterozy-gous variants in the known ID gene AP4E1 that act in arecessive model, as well as de novo variant in a novelgene SPRY4, which has important functions in brain de-velopment. De novo mutation is recognized to play animportant role particularly in the pathogenicity of ID[57], and it is difficult to determine to what extent eachof these variants, if at all, contributes to disease burdenin this patient. The same is true for patient 45 in whomde novo variants for two novel genes, CACNB3 andSCN3A were identified. We note that patient 51 whobears the most complex genotype, is the most severelyaffected in our cohort, and in this case, clinical severitydoes co-relate with number and complexity of genomicalterations, suggesting that gradation of clinical severitymay provide useful toward assessing the contribution ofgenomic alterations.It is recognized that genes responsible for ID convergeonto common networks [1, 58]. The candidate genes weidentified converge onto the UPP, which is critically in-volved in neurodegenerative disease [45] and has im-portant roles in neurodevelopmental disorders [45, 46].This observation is consistent with the notion that theymay be good candidates, and exemplifies the usefulnessof probing molecular links among novel findings.Large secondary positive WES cohort analysis supportsnovel findingsNovel SNV findings from NGS studies require rigor-ous additional studies to support proof of pathogen-icity [14]. In our case, several of our novel candidatescause missense variation, whose effect is difficult tomodel, as opposed to clear loss of function mutationswhich are amenable to functional studies in modelorganisms. Conversely we were unable to conducttraditional genotype-phenotype correlations studies asnone of our patients had a recognizable syndrome tomatch with other patients. Therefore, our approach ofusing a large secondary positive control cohort,despite the phenotypic spectrums not matching ourcases precisely, gave us sufficient ability to test thepredicted causality of our candidate genes and wasthe best strategy available. We were hampered by thelack of an optimal comparison negative control co-hort. We used WGS data from the 1000 genomesproject, which we recognize is primarily comprised oflow coverage samples whose phenotypic spectrum ispoorly characterized (thus yielding likely false negativedata or conversely identifying variants in ‘normal’ in-dividuals who are in fact affected), yet the similarityof sample size between the two groups allowed us toexplore the PDS distribution for these genes reason-ably, providing a useful contributory analysis towardassessing their likely pathogenicity. Finally this largecohort enabled us to further probe the convergenceof our candidate genes upon UPP, by assessing itscontribution versus other biological pathways.WGS is able to detect structural variants below thethreshold of clinical CMA, and enables mechanisticinsights into CNV formationBy using WGS instead of WES, we were able to de-tect a CNV below clinical CMA resolution, isolate it’sbreakpoints, and uncover a possible complex genomiclandscape in one patient. We wanted to conduct acomprehensive screen for CNVs and other structuralvariants to maximize sensitivity. Therefore we usedfour approaches that are fundamentally different;CNAseq and FREEC are sequence based copy-numberestimators that use categorically different algorithmicapproaches for background correction. DELLY is analignment based assembler, whilst ABySS is a de-novogenome assembler. Since each algorithm was opti-mized differently, it therefore yielded different results.For example, CNAseq executes read-depth based bin-ning, and hence aggregates results at telomeres andcentromeres where a larger number of reads re-aligndue to pervasive repeat sequence (Additional file 5:Figure S5). The verified CNV we detected was onlyidentified by DELLY and FREEC, but missed by theother algorithms. Therefore, we caution against usingonly one CNV detection algorithm as this wouldreduce sensitivity. The breakpoint junction sequencein the case of the confirmed 1q43 microdeletion isconsistent with the notion that it could be caused bychromotripsis, a mechanism only recently reported inthe constitutional genome [59], further demonstratingthe utility of WGS data.Zahir et al. BMC Genomics  (2017) 18:403 Page 12 of 16WGS enabled a de novo genome assembly that unmaskedhidden genome complexityABySS de novo assembly identified a translocationmissed by DELLY, and also detected a higher than usualnumber of putative indels in patient 51, who was foundto have a remarkably unstable genome masked bystandard genome re-alignment based analysis (Fig. 6b).However, we experienced difficulty confirming theseevents via Sanger sequencing, which was due, in part, tothe high degree of repeated sequence at breakpoint junc-tions. Genome assembly is able to call events in repetitivesequence better than alignment based algorithms [13],Fig. 6 a Sanger sequencing verification of translocation in patient 51, with karyotype cartoon of balanced translocation. PCR amplicon trace fileshows sequence mapping across the chromosome X -2 translocation boundary. Zoomed-in view shows single base addition at breakpointjucntion. b Circos plot showing mutation burden for patient 51 called by ABySS genome assemblyZahir et al. BMC Genomics  (2017) 18:403 Page 13 of 16though conversely such events are harder to independ-ently verify. We are among the first to use de novo assem-bly to interrogate patients with ID, and our findingssuggest variation located to repeat enriched sequence iscurrently under-ascertained in the constitutional genome.WGS is able to interrogate regulatory genomic sequenceMeaningful interpretation of SNVs within regulatorysequence is hampered by the sparsity of annotations forthe non-coding genome. We implemented two differentfiltering strategies in order to identify non-coding SNVsthat could have a functional impact, and also used topo-logical domain data to further refine good candidates.Though we were able to reduce the number of candidateDVPRR from an average >3700 to dozens in the case ofour gene-screen approach and a handful in the case ofour hypothesis driven approach, nonetheless withoutfurther focused studies, meaningful interpretations areprecluded. In contrast, assessing the impact of CNV-based DVPRR is theoretically less challenging, as it ismore straightforward to predict functional outcome fora complete loss or gain of a possible regulatory se-quence. In summary, though clinically relevant conclu-sions for DVPRR will require a case-by-case analysis andextensive follow-up functional studies, nevertheless wenote it is possible to stratify DVPRR in the context ofknown causative genes for ID using WGS.WGS versus WESWGS yields a comprehensive screen of the genome as,in addition to coding variation, it includes ability to in-vestigate structural variation at a fine scale as discussedabove, and also variation in possible gene regulatory se-quence as well as ‘non-coding genes’ such as ncRNAsfor which there is a paucity of information in the contextof neurodevelopmental disease. While we show strategicstratification for DVPRR can yield results potentiallyrelevant to ID causation, much less is possible forannotation of SNVs within ncRNA sequence, of whichwe detect an average 241 across our samples. Neverthe-less, initial screens such as ours, importantly generateexploratory information for non-coding sequencevariation possible only by WGS.We note that all the SNVs we identified as involved indisease would have been possible to detect by WES. How-ever, WGS yields a more complete view of possible patho-genic variation in each child. This is exemplified in the caseof patient 51, for whom had only WES been performed,while the SQSTM1 and UPF1 SNVs would likely have beendetected, the 1q43 microdeletion would not have beenidentified. In the case of this patient, it is unclear what thegene-effect size for each variant is. Conversely, in the caseof patient 43, for whom we detected the SNV in ARID1B,we are more certain of the penetrance of this variant due tothe normal results for other causative variation in their gen-ome (i.e., that they do not have any CNVs or SVs) from ourWGS data analyses. These data argue in favor of WGS overWES for clinical use.ConclusionsThis is the first study to present extensive analyses ofWGS data in the context of ID, for causative SNV andCNV/SV in both coding and non-coding sequence, andthe first to present results from de novo assembly of thegenome. In a heterogeneous group of eight children withID and morphological brain defects, we were able toidentify candidate causative variants, highlight neurode-velopmental pathways, and unearth hidden genomeinstability, demonstrating the efficacy of a discoveryapproach to WGS analyses in the context of ID.Additional filesAdditional file 1: Supplementary Methods - Additional details onmethods presented succinctly in main text. (DOCX 53 kb)Additional file 2: Table S1. Test of relatedness - Table showingrelatedness for each trio by comparing SNP concordance between child,mother and father. (XLSX 17 kb)Additional file 3: Table S2. Primers for verification - All primersequences used for SNV, CNV and SV verification. (XLSX 18 kb)Additional file 4: Table S3. UK10K study cohorts that comprise thepositive control cohort – Table giving descriptions of the study cohortsfrom the UK10K project that comprised the positive control cohorts andconditions for use. (XLSX 13 kb)Additional file 5: Supplementary Figures. Figure S1. IGV image andSanger verification trace files for indel in ARID1B and missense variation inUPF1. Figure S2. UK10K mutation load – counts as per variant annotationtype on one patient. Figure S3. Histogram of mutation burden perpatient in the UK10K cohort. Figure S4. Pathway interactions showingconvergence onto UPP pathway. Figure S5. Plots for CNV distribution fortwo chromosomes as called by CNAseq. (DOCX 1972 kb)Additional file 6: Table S4. Detailed variant information for all verifiedvariants, including ACMG classification – Table showing details for variantclassification such as pathogenicity prediction algorithms results anddetails of ACMG criteria application. (XLSX 18 kb)Additional file 7: Table S5. Significantly enriched KEGG pathways forPDS burden in positive control cohort – Details for pathway enrichmentanalysis showing all KEGG pathways and burden of enrichment for eachin the positive control cohort. (XLSX 15 kb)Additional file 8: Table S6. Raw data for SNVs detected across allpatients by Mendelian Inheritance Pattern filtering – all SNVs detectedper trio, organized as one trio per sheet, using the Mendelian inheritancefiltering. Sheet 1 gives the legend. All SNVs are annotated, includingCADD and RVIS scores. (XLSX 57 kb)Additional file 9: Table S7 CNV/SV/Indel Verification Summary – Tablegiving a list of all indels, CNVs and SVs (inversions and translocations),and details on their verification. (XLSX 15 kb)Additional file 10: Table S8. Summary of SNVs and CNVs across 8 IDpatients in DVPRR – Details on all SNVS and CNVs detected in DVPRR forall patients, organized by those intersecting with DDD genes and thosewith UPP genes. (XLSX 40 kb)Abbreviations1000G project: http://www.internationalgenome.org/; CMA: Clinicalmicroarray; CNV: Copy number variant; dbSNP135: https://Zahir et al. BMC Genomics  (2017) 18:403 Page 14 of 16www.ncbi.nlm.nih.gov/snp; DGV: Database of Genomic Variants; DGV: http://dgv.tcag.ca/dgv/app/home; DVPRR: de novo variants in possible regulatoryregions; FANTOM consortium: http://fantom.gsc.riken.jp/; GRCh37-lite/hg19a: http://www.bcgsc.ca/downloads/genomes/9606/hg19/1000genomes/bwa_ind/genome/README.GRCh37-lite; KEGG: www.genome.jp/kegg/pathway.html; NGS: Next generation sequencing; NHLBI-ESP: http://evs.gs.washington.edu/EVS/; OMIM: www.omim.org; PDS: Potentiallydamaging SNVs; Picard: https://broadinstitute.github.io/picard/; SNV: Singlenucleotide variant; STRING: www.string-db.org; SV: Structural variant; UK10Kproject: www.uk10k.org; UPP: Ubiquitin proteasome pathway; WES: Wholeexome sequencing; WGS: Whole genome sequencingAcknowledgementsWe thank Erica Tsang for assistance in patient selection and Jenny Poon forDNA extraction. We thank Patricia Birch for obtaining institutional reviewboards ethics approvals. We thank the bioinformatics team at Canada’sMichael Smith Genome Sciences Center. We thank Casper Shyr forbioinformatics support. We acknowledge the following UK10K projects:-Edinburgh MR-psychosis samples, Edinburgh Schizophrenia Samples, The Na-tional Institute for Health and Welfare (THL) Finnish Schizophrenia Familiesfrom the “The genetic etiology of severe mental disorders in Finland” study,EGAS121, CardiffScz, Scottish schizophrenia cases, Trinity College Dublin Aut-ism Genetics Collection, The Molecular Genetics of Neuromuscular DisordersStudy, The Familial Intellectual Disability study. This work was supported bythe Canadian Institute of Health Research (CIHR). F.R.Z was supported by aCIHR Post-Doctoral Scholarship, NeuroDevNet Post-Doctoral Fellowship, anda University of British Columbia (UBC) Bluma Tischler Post-Doctoral Fellow-ship. E.L.L is supported by a CIHR Doctoral Award and a UBC Four Year Fel-lowship. CvK is recipient of the Michael Smith Foundation for HealthResearch Scholar Award. Finally we thank the patients and their families. Allauthors declare no conflict of interest.FundingThis work was supported by the Canadian Institute of Health Research [grantnumber MOP-102600]. The funding institution had no direct role in studydesign, sample collection, analysis, interpretation of data, nor in manuscriptwriting. Annual reports were submitted to the funding institution trackingthe progress of the project.Availability of data and materialsThe dataset supporting the conclusions of this article is available in theEuropean Genome-Phenome Archive database (https://www.ebi.ac.uk/ega/studies/EGAS00001001386).Authors’ contributionsFRZ, JMF, and MAM conceived and designed the study, interpreted the dataand wrote and reviewed the manuscript. FRZ managed the study, andconducted the objective pathway analyses for WGS data. JM performed N of 1analyses of WGS data and copy number analyses, and provided editorial input.H-YEC performed gene regulatory region variation analyses and participated insecondary control study. ELL performed secondary control study and bootstrapanalysis. KM performed ABySS indel analysis. CvK performed N of 1 study forpatient 45. MC performed TAD analyses for regulatory region study. LL ex-tracted DNA and performed Sanger verification. PB obtained research ethicsboards approvals. NM, LLA, CNB, BAM and SLL recruited patients. SJMJ oversawthe bioinformatics analyses by the GSC. All authors read and approved the finalmanuscript.Competing interestsThe authors declare that they have no competing interests.Consent for publicationNot applicable.Ethics approval and consent to participateThis study is approved by the British Columbia Children’s and Women’shospital research ethics boards (H10-00695). All families (parents forthemselves, and on behalf of children) provided written informed consent toparticipate.Publisher’s NoteSpringer Nature remains neutral with regard to jurisdictional claims inpublished maps and institutional affiliations.Author details1Canada’s Michael Smith Genome Sciences Center, Vancouver, BC V5Z 4S6,Canada. 2Department of Medical Genetics, University of British Columbia,Vancouver, BC V6T 1Z4, Canada. 3Department of Pediatrics, Centre forMolecular Medicine & Therapeutics Child & Family Research Institute,University of British Columbia, Vancouver, BC V6T 1Z4, Canada. 4ProvincialMedical Genetics Programme, Children’s & Women’s Health Centre of BritishColumbia, Vancouver, BC V6H 3N1, Canada. 5Qatar Biomedical ResearchInstitute, Hamad Bin Khalifa University, P.O. Box 34110, Doha, Qatar.Received: 4 November 2016 Accepted: 29 March 2017References1. Vissers LE, Gilissen C, Veltman JA. Genetic studies in intellectual disabilityand related disorders. Nat Rev Genet. 2016;17(1):9–18.2. Gilissen C, Hehir-Kwa JY, Thung DT, van de Vorst M, van Bon BW,Willemsen MH, Kwint M, Janssen IM, Hoischen A, Schenck A, et al.Genome sequencing identifies major causes of severe intellectualdisability. Nature. 2014;511(7509):344–7.3. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA,Handsaker RE, Lunter G, Marth GT, Sherry ST, et al. The variant call formatand VCFtools. Bioinformatics. 2011;27(15):2156–8.4. Fejes AP, Khodabakhshi AH, Birol I, Jones SJ. Human variation database:an open-source database template for genomic discovery.Bioinformatics. 2011;27(8):1155–6.5. Sim NL, Kumar P, Hu J, Henikoff S, Schneider G, Ng PC. SIFT web server:predicting effects of amino acid substitutions on proteins. Nucleic AcidsRes. 2012;40(Web Server issue):W452–7.6. Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Detection of nonneutralsubstitution rates on mammalian phylogenies. Genome Res. 2010;20(1):110–21.7. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P,Kondrashov AS, Sunyaev SR. A method and server for predicting damagingmissense mutations. Nat Methods. 2010;7(4):248–9.8. Schwarz JM, Rodelsperger C, Schuelke M, Seelow D. MutationTasterevaluates disease-causing potential of sequence alterations. Nat Methods.2010;7(8):575–6.9. Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A generalframework for estimating the relative pathogenicity of human geneticvariants. Nat Genet. 2014;46(3):310–5.10. Boeva V, Popova T, Bleakley K, Chiche P, Cappo J, Schleiermacher G,Janoueix-Lerosey I, Delattre O, Barillot E. Control-FREEC: a tool for assessingcopy number and allelic content using next-generation sequencing data.Bioinformatics. 2012;28(3):423–5.11. Jones SJ, Laskin J, Li YY, Griffith OL, An J, Bilenky M, Butterfield YS, Cezard T,Chuah E, Corbett R, et al. Evolution of an adenocarcinoma in response toselection by targeted kinase inhibitors. Genome Biol. 2010;11(8):R82.12. Rausch T, Zichner T, Schlattl A, Stutz AM, Benes V, Korbel JO. DELLY:structural variant discovery by integrated paired-end and split-read analysis.Bioinformatics. 2012;28(18):i333–9.13. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. ABySS: aparallel assembler for short read sequence data. Genome Res. 2009;19(6):1117–23.14. Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, Grody WW, Hegde M,Lyon E, Spector E, et al. Standards and guidelines for the interpretation ofsequence variants: a joint consensus recommendation of the AmericanCollege of Medical Genetics and Genomics and the Association for MolecularPathology. Genet Med. 2015;17(5):405–24.15. Consortium UK, Walter K, Min JL, Huang J, Crooks L, Memari Y, McCarthy S,Perry JR, Xu C, Futema M, et al. The UK10K project identifies rare variants inhealth and disease. Nature. 2015;526(7571):82–90.16. Genomes Project C, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM,Korbel JO, Marchini JL, McCarthy S, McVean GA, et al. A global reference forhuman genetic variation. Nature. 2015;526(7571):68–74.17. Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes.Nucleic Acids Res. 2000;28(1):27–30.Zahir et al. BMC Genomics  (2017) 18:403 Page 15 of 1618. Tucker T, Zahir FR, Griffith M, Delaney A, Chai D, Tsang E, Lemyre E,Dobrzeniecka S, Marra M, Eydoux P, et al. Single exon-resolution targetedchromosomal microarray analysis of known and candidate intellectualdisability genes. Eur J Hum Genet. 2014;22(6):792–800.19. Halgren C, Kjaergaard S, Bak M, Hansen C, El-Schich Z, Anderson CM,Henriksen KF, Hjalgrim H, Kirchhoff M, Bijlsma EK, et al. Corpus callosumabnormalities, intellectual disability, speech impairment, and autism inpatients with haploinsufficiency of ARID1B. Clin Genet. 2012;82(3):248–55.20. Lower KM, Turner G, Kerr BA, Mathews KD, Shaw MA, Gedeon AK,Schelley S, Hoyme HE, White SM, Delatycki MB, et al. Mutations in PHF6are associated with Borjeson-Forssman-Lehmann syndrome. Nat Genet.2002;32(4):661–5.21. Wieczorek D, Bogershausen N, Beleggia F, Steiner-Haldenstatt S, Pohl E, Li Y,Milz E, Martin M, Thiele H, Altmuller J, et al. A comprehensive molecularstudy on Coffin-Siris and Nicolaides-Baraitser syndromes identifies a broadmolecular and clinical spectrum converging on altered chromatinremodeling. Hum Mol Genet. 2013;22(25):5121–35.22. Zweier C, Kraus C, Brueton L, Cole T, Degenhardt F, Engels H, Gillessen-Kaesbach G,Graul-Neumann L, Horn D, Hoyer J, et al. A new face of Borjeson-Forssman-Lehmann syndrome? De novo mutations in PHF6 in seven females with a distinctphenotype. J Med Genet. 2013;50(12):838–47.23. Berland S, Alme K, Brendehaug A, Houge G, Hovland R. PHF6 deletions maycause Borjeson-Forssman-Lehmann syndrome in females. Mol Syndromol.2011;1(6):294–300.24. Kosho T, Miyake N, Carey JC. Coffin-Siris syndrome and related disordersinvolving components of the BAF (mSWI/SNF) complex: historical reviewand recent advances using next generation sequencing. Am J Med Genet CSemin Med Genet. 2014;166C(3):241–51.25. Todd MA, Picketts DJ. PHF6 interacts with the nucleosome remodeling anddeacetylation (NuRD) complex. J Proteome Res. 2012;11(8):4326–37.26. Zhang C, Mejia LA, Huang J, Valnegri P, Bennett EJ, Anckar J, Jahani-Asl A,Gallardo G, Ikeuchi Y, Yamada T, et al. The X-linked intellectual disabilityprotein PHF6 associates with the PAF1 complex and regulates neuronalmigration in the mammalian brain. Neuron. 2013;78(6):986–93.27. Zhang S, Lin Y, Itaranta P, Yagi A, Vainio S. Expression of Sprouty genes 1, 2and 4 during mouse organogenesis. Mech Dev. 2001;109(2):367–70.28. Yu T, Yaguchi Y, Echevarria D, Martinez S, Basson MA. Sprouty genesprevent excessive FGF signalling in multiple cell types throughoutdevelopment of the cerebellum. Development. 2011;138(14):2957–68.29. Hausott B, Vallant N, Schlick B, Auer M, Nimmervoll B, Obermair GJ, Schwarzer C,Dai F, Brand-Saberi B, Klimaschewski L. Sprouty2 and −4 regulate axonoutgrowth by hippocampal neurons. Hippocampus. 2012;22(3):434–41.30. Dyer C, Blanc E, Hanisch A, Roehl H, Otto GW, Yu T, Basson MA, Knight R. Abi-modal function of Wnt signalling directs an FGF activity gradient tospatially regulate neuronal differentiation in the midbrain. Development.2014;141(1):63–72.31. Labalette C, Bouchoucha YX, Wassef MA, Gongal PA, Le Men J, Becker T,Gilardi-Hebenstreit P, Charnay P. Hindbrain patterning requires fine-tuning ofearly krox20 transcription by Sprouty 4. Development. 2011;138(2):317–26.32. Wang YH, Beck CW. Distal expression of sprouty (spry) genes duringXenopus laevis limb development and regeneration. Gene Expr Patterns.2014;15(1):61–6.33. Cork RJ, Namkung Y, Shin HS, Mize RR. Development of the visual pathwayis disrupted in mice with a targeted disruption of the calcium channelbeta(3)-subunit gene. J Comp Neurol. 2001;440(2):177–91.34. Murakami M, Nakagawasai O, Yanai K, Nunoki K, Tan-No K, Tadano T, Iijima T.Modified behavioral characteristics following ablation of the voltage-dependentcalcium channel beta3 subunit. Brain Res. 2007;1160:102–12.35. Bidaud I, Mezghrani A, Swayne LA, Monteil A, Lory P. Voltage-gated calciumchannels in genetic diseases. Biochim Biophys Acta. 2006;1763(11):1169–74.36. Rea SL, Majcher V, Searle MS, Layfield R. SQSTM1 mutations–bridging Pagetdisease of bone and ALS/FTLD. Exp Cell Res. 2014;325(1):27–37.37. Franks TM, Singh G, Lykke-Andersen J. Upf1 ATPase-dependent mRNPdisassembly is required for completion of nonsense- mediated mRNAdecay. Cell. 2010;143(6):938–50.38. Barmada SJ, Ju S, Arjun A, Batarse A, Archbold HC, Peisach D, Li X, Zhang Y,Tank EM, Qiu H, et al. Amelioration of toxicity in neuronal models of amyotrophiclateral sclerosis by hUPF1. Proc Natl Acad Sci U S A. 2015;112(25):7821–6.39. Jackson KL, Dayton RD, Orchard EA, Ju S, Ringe D, Petsko GA, Maquat LE,Klein RL. Preservation of forelimb function by UPF1 gene therapy in a ratmodel of TDP-43-induced motor paralysis. Gene Ther. 2015;22(1):20–8.40. Guarguaglini G, Duncan PI, Stierhof YD, Holmstrom T, Duensing S, Nigg EA.The forkhead-associated domain protein Cep170 interacts with Polo-likekinase 1 and serves as a marker for mature centrioles. Mol Biol Cell. 2005;16(3):1095–107.41. Insolera R, Shao W, Airik R, Hildebrandt F, Shi SH. SDCCAG8 regulatespericentriolar material recruitment and neuronal migration in thedeveloping cortex. Neuron. 2014;83(4):805–22.42. Airik R, Slaats GG, Guo Z, Weiss AC, Khan N, Ghosh A, Hurd TW, Bekker-Jensen S,Schroder JM, Elledge SJ, et al. Renal-retinal ciliopathy gene Sdccag8 regulatesDNA damage response signaling. J Am Soc Nephrol. 2014;25(11):2573–83.43. Nagamani SC, Erez A, Bay C, Pettigrew A, Lalani SR, Herman K, Graham BH,Nowaczyk MJ, Proud M, Craigen WJ, et al. Delineation of a deletion regioncritical for corpus callosal abnormalities in chromosome 1q43-q44. Eur JHum Genet. 2012;20(2):176–9.44. Perlman SJ, Kulkarni S, Manwaring L, Shinawi M. Haploinsufficiency ofZNF238 is associated with corpus callosum abnormalities in 1q44 deletions.Am J Med Genet A. 2013;161A(4):711–6.45. McKinnon C, Tabrizi SJ. The ubiquitin-proteasome system inneurodegeneration. Antioxid Redox Signal. 2014;21(17):2302–21.46. Gwizdek C, Casse F, Martin S. Protein sumoylation in brain development, neuronalmorphology and spinogenesis. Neuromolecular Med. 2013;15(4):677–91.47. Moreno-De-Luca A, Helmers SL, Mao H, Burns TG, Melton AMA, Schmidt KR,Fernhoff PM, Ledbetter DH, Martin CL. Adaptor protein complex-4 (AP-4)deficiency causes a novel autosomal recessive cerebral palsy syndrome withmicrocephaly and intellectual disability. J Med Genet. 2011;48(2):141–4.48. Abou Jamra R, Philippe O, Raas-Rothschild A, Eck SH, Graf E, Buchert R,Borck G, Ekici A, Brockschmidt FF, Nothen MM, et al. Adaptor proteincomplex 4 deficiency causes severe autosomal-recessive intellectualdisability, progressive spastic paraplegia, shy character, and short stature.Am J Hum Genet. 2011;88(6):788–95.49. Vanoye CG, Gurnett CA, Holland KD, George Jr AL, Kearney JA. NovelSCN3A variants associated with focal epilepsy in children. Neurobiol Dis.2014;62:313–22.50. Kawaji H, Severin J, Lizio M, Waterhouse A, Katayama S, Irvine KM, Hume DA,Forrest AR, Suzuki H, Carninci P, et al. The FANTOM web resource: frommammalian transcriptional landscape to its dynamic regulation. Genome Biol.2009;10(4):R40.51. Khurana E, Fu Y, Colonna V, Mu XJ, Kang HM, Lappalainen T, Sboner A,Lochovsky L, Chen J, Harmanci A, et al. Integrative annotation of variantsfrom 1092 humans: application to cancer genomics. Science. 2013;342(6154):1235587.52. Firth HV, Wright CF, Study DDD. The Deciphering Developmental Disorders(DDD) study. Dev Med Child Neurol. 2011;53(8):702–3.53. Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, Hu M, Liu JS, Ren B.Topological domains in mammalian genomes identified by analysis ofchromatin interactions. Nature. 2012;485(7398):376–80.54. Kleefstra T, Kramer JM, Neveling K, Willemsen MH, Koemans TS, Vissers LE,Wissink-Lindhout W, Fenckova M, van den Akker WM, Kasri NN, et al.Disruption of an EHMT1-associated chromatin-modification module causesintellectual disability. Am J Hum Genet. 2012;91(1):73–82.55. Heidari A, Tongsook C, Najafipour R, Musante L, Vasli N, Garshasbi M, Hu H,Mittal K, McNaughton AJ, Sritharan K, et al. Mutations in the histamineN-methyltransferase gene, HNMT, are associated with nonsyndromicautosomal recessive intellectual disability. Hum Mol Genet. 2015;24:5697–710.56. Gibson WT, Hood RL, Zhan SH, Bulman DE, Fejes AP, Moore R, Mungall AJ,Eydoux P, Babul-Hirji R, An J, et al. Mutations in EZH2 cause Weaversyndrome. Am J Hum Genet. 2012;90(1):110–8.57. Veltman JA, Brunner HG. De novo mutations in human genetic disease. NatRev Genet. 2012;13(8):565–75.58. van Bokhoven H. Genetic and epigenetic networks in intellectual disabilities.Annu Rev Genet. 2011;45:81–104.59. de Pagter MS, van Roosmalen MJ, Baas AF, Renkens I, Duran KJ,van Binsbergen E, Tavakoli-Yaraki M, Hochstenbach R, van der Veken LT,Cuppen E, et al. Chromothripsis in healthy individuals affects multipleprotein-coding genes and can result in severe congenital abnormalitiesin offspring. Am J Hum Genet. 2015;96(4):651–6.Zahir et al. BMC Genomics  (2017) 18:403 Page 16 of 16


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items