Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Computer aided drug discovery tools and pipelines for molecule targeted inhibition of Topoisomerases… Alperstein, Zaccary 2018

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2019_february_alperstein_zaccary.pdf [ 13.5MB ]
JSON: 24-1.0373193.json
JSON-LD: 24-1.0373193-ld.json
RDF/XML (Pretty): 24-1.0373193-rdf.xml
RDF/JSON: 24-1.0373193-rdf.json
Turtle: 24-1.0373193-turtle.txt
N-Triples: 24-1.0373193-rdf-ntriples.txt
Original Record: 24-1.0373193-source.json
Full Text

Full Text

Computer Aided Drug Discovery Tools and Pipelines forSmall Molecule Targeted Inhibition of Topoisomerases Iand II in CancerbyZaccary AlpersteinB.Sc University of Western Ontario, 2015A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Bioinformatics)The University of British Columbia(Vancouver)October 2018c Zaccary Alperstein, 2018The following individuals certify that they have read, and recommend to the Fac-ulty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled:Computer Aided Drug Discovery Tools and Pipelines for SmallMoleculeTargeted Inhibition of Topoisomerases I and II in Cancersubmitted by Zaccary Alperstein in partial fulfillment of the requirements for thedegree ofMaster of Science in Bioinformatics.Examining Committee:Xuesen DongCo-SupervisorFaraz HachSupervisory Committee MemberRyan BrinkmanChairAdditional Supervisory Committee Members:Artem CherkasovSupervisoriiAbstractComputer Aided Drug Discovery (CADD) is a broad field which uses scientific toolsfrom various disparate disciplines towards drug discovery and design. The tools ofCADD include receptor based methods such as docking and Molecular Dynam-ics (MD), and ligand based methods such as Quantitative Structure Activity Rela-tionship (QSAR) / Quantitative Structure Property Relationship (QSPR) approachesrelying directly on the structure of small molecules. In this thesis both of theseapproaches to CADD have been utilized together to drug topoisomerases. There aretwo clinically relevant members of the Topoisomerase (TOP) protein family, TOPI and TOP II, both of which the tools of CADD were applied. For the former, sev-eral novel TOP I natural product-like inhibitors were developed and characterizedwith molecular dynamics simulations. This led to the in-silico based prediction ofunique but strong non-covalent interactions in the binding site, as well as a rational-ization of the difference in activity between two enantiomers. For TOP II, a broaderCADD campaign was initiated to screen ⇠ 12 million molecules from the ZINC IsNot Commercial (ZINC)-15 [1] database against TOP II. This was facilitated bythe implementation of consensus voting protocols from various virtual screeningprograms [2] and compositions of machine learning techniques. With a synergybetween in-silico and wet-lab components, a rational drug discovery pipeline wasexecuted to discover and characterize a highly potent inhibitor. The identified com-pound has been shown to inhibit topoisomerase in a Kinetoplast DNA (KDNA) de-catenation assay with a nanomolar Concentration at 50% Cellular Inhibition (IC50).Furthermore, it has been demonstrated that the identified drug candidate does notact as a ”DNA poison”, as no linear Deoxyribonucleic Acid (DNA) is formed uponincubation with TOP II in a relaxation assay and no general DNA damage is ob-iiiserved. Finally, a mechanism of action for the lead compound is proposed, basedon experimental and in-silico evidence.ivLay SummaryThe goal of the work in this thesis is to use cutting edge computational tools tofind new, more potent, and safer drugs to treat cancer. Many kinds of cancer cellsrely on an enzyme called topoisomerase which help cancer cells proliferate. Thereare two important topoisomerase enzymes in humans, TOP I and TOP II and sothere were two projects executed to successfully discover and characterize newmolecules which inhibit the enzymes. In the first project, natural product basedTOP I inhibitors were discovered, and the relationship between their molecularstructure and their drug activity elucidated. In the second project inhibitors forTOP II were discovered and characterized. These new inhibitors are likely to besafer than currently used drugs as they do not display the type of toxic propertieswhich are related to the side affects of modern chemotherapeutics.vPrefaceChapter three has been published [3] it should be noted that all the wet-lab work,such as the relaxation experiments, organic synthesis, and cellular assays weredone by Yunrui Cai, Huajian Zhu, and Wenjun Yu under the supervision of Hong-bin Zou at Zheijang University. Chapter two is in preparation to be submitted to ascientific journal and a patent application has been filed. Some of this work waspresented at the Canadian Cancer Research Conference, as well as the Bioinformat-ics, Interdisciplinary Oncology, and Genome Science Research Day. Additionallythe wet lab experiments discussed in chapter two were done by the Dong lab.viTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix1 Introduction Part 1: Topoisomerases and Cancer . . . . . . . . . . . 11.1 Topoisomerases . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Prostate Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.1 Prostate Cancer Diagnosis . . . . . . . . . . . . . . . . . 41.2.2 CRPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 Topoisomerases and their involvement in Cancer . . . . . . . . . 61.3.1 Topoisomerase II and Prostate Cancer . . . . . . . . . . . 61.3.2 Topoisomerase II Targeting Therapeutics . . . . . . . . . 71.3.3 Top I and Cancer . . . . . . . . . . . . . . . . . . . . . . 9vii2 Introduction Part 2: Methods and Tools . . . . . . . . . . . . . . . . 112.1 Receptor Based Methods . . . . . . . . . . . . . . . . . . . . . . 112.1.1 Docking . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1.2 Molecular Dynamics Simulations . . . . . . . . . . . . . 152.2 Ligand Based Methods . . . . . . . . . . . . . . . . . . . . . . . 192.2.1 Ligand Similarity Based Methods . . . . . . . . . . . . . 192.2.2 Machine Learning in Drug Discovery . . . . . . . . . . . 213 Applied Drug Discovery: Topoisomerase I . . . . . . . . . . . . . . . 313.1 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . 343.4.1 Chemistry . . . . . . . . . . . . . . . . . . . . . . . . . . 343.4.2 Study of the Favorable Configuration of C-3 . . . . . . . . 353.4.3 Structure-Activity Relationship . . . . . . . . . . . . . . 373.5 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.5.1 Topoisomerase I Docking of MIAs . . . . . . . . . . . . . 383.5.2 Topoisomerase I simulations . . . . . . . . . . . . . . . . 383.5.3 Synthesis of N-Substituted THAs . . . . . . . . . . . . . 393.5.4 Topoisomerase I Inhibition Assay . . . . . . . . . . . . . 393.5.5 In Vitro Cytotoxicity Assay . . . . . . . . . . . . . . . . 393.5.6 SAR of the S-enantiomer . . . . . . . . . . . . . . . . . . 404 Applied Drug Discovery: Topoisomerase II . . . . . . . . . . . . . . 454.1 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.2.1 Binding Site Identification . . . . . . . . . . . . . . . . . 454.3 First Round Screening . . . . . . . . . . . . . . . . . . . . . . . 484.4 Second Round Screening . . . . . . . . . . . . . . . . . . . . . . 494.5 Wet lab Characterization of active compounds . . . . . . . . . . . 524.6 Third Round Predictions . . . . . . . . . . . . . . . . . . . . . . 544.6.1 Dataset Preparation . . . . . . . . . . . . . . . . . . . . . 54viii4.6.2 Decision Boundary Tightening . . . . . . . . . . . . . . . 564.6.3 Machine Learning Pipeline Development . . . . . . . . . 574.7 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.7.1 Methodological Details of in-silico Techniques . . . . . . 584.7.2 Wet-Lab Experimental Methodology . . . . . . . . . . . 594.8 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . 615 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735.0.1 Topoisomerase I . . . . . . . . . . . . . . . . . . . . . . 735.0.2 Topoisomerase II . . . . . . . . . . . . . . . . . . . . . . 74Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76A Supporting Materials . . . . . . . . . . . . . . . . . . . . . . . . . . 88ixList of TablesTable 1.1 Topological Preferences of Topoisomerase enzymes. . . . . . . 2Table 1.2 Gleason Score and its corresponding risk based on the NationalComprehensive Cancer Network Risk Stratification Criterion. . 4Table 1.4 The AJCC TNM system with Gleason Score [4]. . . . . . . . . 5Table 1.6 Important Small Molecule which target Eukaryotic Topoiso-merase. 1: ICRF-187 was approved for it’s ability to chelateiron. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Table 3.1 In Vitro Cytotoxic Activity of the THA Enantiomers againstHepG2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Table 3.2 Antiproliferative Activity of the N-Substituted (S)- THADeriva-tives against HepG2 . . . . . . . . . . . . . . . . . . . . . . . 44Table 4.1 Decatenation assay based IC 50 in nM. . . . . . . . . . . . . . 53Table 4.2 List of actives, relevant inactives, and partial actives for theSAR. Thresholding was done based on the decatenation intensi-ties, where anything less than the decatenation intensity of com-pound 23 is classified as active, anything between compound 23and compound 19 is inactive, while the rest are partially active.Variable groups map onto the molecule based on Figure 4.25 . 70xList of FiguresFigure 1.2 Camptothecin hemiacetal based tautomerization. . . . . . . . 9Figure 1.1 Catalytic Cycle of Topoisomerase II. The steps at which TopII targets affect the cycle are shown. This figure was adaptedfrom [5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Figure 2.1 The ECFP fingerprint hashing process. Benzoic acid amideis show with hashing at various iterations for atom 1. Atomsdenoted with ”A” have not been included in the hash yet [6]. . 20Figure 2.2 The directed graphical model describing Naive Bayes. . . . . 26Figure 3.1 Chemo-enzymatic Synthesis of THA . . . . . . . . . . . . . 31Figure 3.2 STR1 as a gateway enzyme for the biosynthesis of natural MIAs. 34Figure 3.3 STR1 as a biocatalyst for potential Top1 inhibitor THAs. . . . 36Figure 3.4 Top1-mediated DNA cleavage assay. (A) Inhibition of Top1relaxation activity at 200 and 500 µM. Lane 1, DNA alone;lane 2, DNA + Top1; lanes 3-16, DNA + Top1 + CPT or testcompounds (7a, 8a, 7b, 8b, 7c, 8c). (B) Inhibition of Top1 re-laxation activity at 200 µM. Lane 1, DNA alone; lane 2, DNA+ Top1; lanes 3-21, DNA + Top1 + CPT or test compounds(7a-7r, respectively). (C) Inhibition of Top1 relaxation activityat 50, 100, and 200 µM. Lane 1, DNA alone; lane 2, DNA +Top1; lanes 3-11, DNA + Top1 + CPT or test compounds (7a,7i). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40xiFigure 3.5 Synthesis of N-substituted THA derivatives. a Reagents andconditions: (a) Strictosidine Synthase (STR1), KPi buffer, pH7.0, 5 C; (b) 5% Na2CO3, 70 C; (c) H2, Pd, rt; (d) glucosidase,acetate buffer, pH 5.0, 37 C; (e) NH4OAc, EtOH, reflux; (f) p-TsOH, toluene, 90 C; (g) acyl or alkyl halides, t-BuOK, THF,rt; (h) acetic acid, water, reflux . . . . . . . . . . . . . . . . . 41Figure 3.6 MD simulations of 7a and 8a in Top I (PDB ID: 1k4t). (A)Three evenly spaced snapshots taken from the simulations of8a in the color-coded order of green, red, and purple. (B)Three evenly spaced snapshots taken of the simulation withcompound 7a, in the color-coded order of green, red, and pur-ple. (C) Final snapshot of the simulations of 8a. (D) Finalsnapshot of the simulation of 7a. In all of the pictures, theprotein and most of the DNA have been removed for clarity. . 42Figure 3.7 (A) Superimposed docked complexes of 7i (green) and CPT(purple). (B) Comparison of the final snapshot of 7i (green)with CPT (purple) in a Top1–DNA complex. (C) Bindingmode of compound 7i and 3D superimposed structure of thefinal simulated snapshot (grey). In PDB entry 1k4t, Met 230and Lys 334 correspond to Met 428 and Lys 751, respectively. 43Figure 4.1 The Van-der-Waals surface of ICRF is show in blue whichICRF inside with respect to the ATPase domain of TOP II a . . 47Figure 4.2 RMSD of pocket residues from Amber simulation of Topoi-somerase II a . Residues used where: 432-434, 350-359,321-323, 299-303, 286-289, 273-284,223-231, and 219-222. Eachframe represents 2500 steps at 2 femto-seconds each. . . . . . 47Figure 4.3 Summary of the CADD pipeline. (Top) Compound 23 is shownas the first hit discovered from the pipeline. This is fed backinto the next pipeline for round 2, resulting in the discovery ofcompound 112, shown docked into the pocket (Bottom). . . . 48xiiFigure 4.4 First round relaxation assay results. Compound concentrationsare present at 5 µM. Here D denotes DMSO treated with topoi-somerase, Etop for etoposide, and -: No topoisomerase added.Red font indicates active compounds. . . . . . . . . . . . . . 49Figure 4.5 First round kDNA decatenation assay results. Compound con-centrations are present at 5 µM. Here D denotes DMSO, L forlinear DNA, Etop for etoposide. . . . . . . . . . . . . . . . . 50Figure 4.6 Compound 19. . . . . . . . . . . . . . . . . . . . . . . . . . 50Figure 4.7 Summary of active and inactive compounds per round. . . . . 51Figure 4.8 Compound 60. . . . . . . . . . . . . . . . . . . . . . . . . . 51Figure 4.9 Compound 112. . . . . . . . . . . . . . . . . . . . . . . . . . 52Figure 4.10 MTS assay with results for HeLa cells treated with compound23. R1881 is a AR agonist and MDV is an AR antagonist,which are used here as controls, no effect is expected. . . . . . 52Figure 4.11 BrdU assay with compound 60 and 110. . . . . . . . . . . . . 53Figure 4.12 H2Ag western blot. Cells were treated with etoposide (positivecontrol), ICRF (negative control), or Top60 (compound 60).Concentrations are in µM. . . . . . . . . . . . . . . . . . . . 54Figure 4.13 FluorescenceMicroscopy of HeLa cells treated with compound60 or ICRF. . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Figure 4.14 Pyridinothione tautomerisation. Left is the keto form and rightis the enol. . . . . . . . . . . . . . . . . . . . . . . . . . . . 55Figure 4.15 Illustration of the curse of dimensionality. The ratio betweenthe volume of a hypersphere to a hypercube where the sphereis inside the cube is shown as dimensionality increases. Herer = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57Figure 4.16 Accuracy of various machine learning classifiers for round 1and 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Figure 4.17 F1 Score of various machine learning classifiers for round 1and 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64Figure 4.18 Average F1 Score from 10-fold CV for various machine learn-ing classifiers for round 1 and 2. . . . . . . . . . . . . . . . . 64Figure 4.19 MCC for various machine learning classifiers for round 1 and 2. 65xiiiFigure 4.20 Kappa score for various machine learning classifiers for round1 and 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65Figure 4.21 Interaction Count Based Histogram. Counts are marked at ev-ery time point for each residue (below) and the total number ofinteractions are histogrammed above. Note the mutual exclu-sivity between GLN 789 and GLY 793 . . . . . . . . . . . . . 66Figure 4.22 The ligand RMSD and protein RMSD throughout the simula-tion. The ligand RMSD is shown in red, and stabilizes after22 nanoseconds, the protein RMSD is show in blue. . . . . . . 67Figure 4.23 Protein-Ligand Interaction Histogram of specific interactiontypes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67Figure 4.24 Protein-Ligand Interaction Diagram. Only interactions that oc-cur more than 20% of the simulation are shown. . . . . . . . . 68Figure 4.25 Active Scaffold. Table 4.2 . . . . . . . . . . . . . . . . . . . 68Figure 4.26 Compound 88. This compound is relevant for the SAR, it isshown separately as it has a different scaffold than the moleculesexplored in Table 4.2 . . . . . . . . . . . . . . . . . . . . . . 69Figure 4.27 Compound 89. This compound is relevant for the SAR, it isshown separately as it has a different scaffold than the moleculesexplored in Table 4.2 . . . . . . . . . . . . . . . . . . . . . . 70Figure A.1 Second round decatenation results. Compound concentrationsare present at 5 µM. Here Veh denotes DMSO, Etop for etopo-side. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88Figure A.2 Second round decatenation results. Compound concentrationsare present at 5 µM. Here Decat Marker denotes fully decate-nated DNA from Top II treatment, kDNA denotes catenatedDNA , Linear marker for linear DNA, Etop for etoposide. . . . 89Figure A.3 Second round decatenation results. Compound concentrationsare present at 5 µM. Here Decat Marker denotes fully decate-nated DNA from Top II treatment, kDNA denotes catenatedDNA , Linear marker for linear DNA, Etop for etoposide. . . . 89xivFigure A.4 Second round decatenation results. Compound concentrationsare present at 5 µM. Here Decat Marker denotes fully decate-nated DNA from Top II treatment, kDNA denotes catenatedDNA , Etop for etoposide. . . . . . . . . . . . . . . . . . . . 90Figure A.5 Second round decatenation results. Compound concentrationsare present at 5 µM. Here D denotes DMSO, L for linear DNA,Etop for etoposide. . . . . . . . . . . . . . . . . . . . . . . . 90Figure A.6 Second round decatenation results. Compound concentrationsare present at 5 µM. Here D denotes DMSO, L for linear DNA,Etop for etoposide. . . . . . . . . . . . . . . . . . . . . . . . 90Figure A.7 Interaction frequency with key residues throughout the 100nssimulation of Topoisomerase 1 (1k4t) with compound 7a. Thecolors are coded as follows: purple represents hydrophobic in-teractions, blue represents water bridges, and green representshydrogen bonds. Interactions were analyzed with Schrodinger’ssimulation interactions script. . . . . . . . . . . . . . . . . . 91Figure A.8 Protein-ligand interactions. Interactions present for over 30%of the simulation are visualized. Note water bridge presentfor most of the dynamics. Interactions were analyzed withSchrodinger’s simulation interactions script. . . . . . . . . . . 91Figure A.9 Compound 83. This compound is relevant for the SAR, it isshown separately as it has a different scaffold than the moleculesexplored in the table. Table 4.2 . . . . . . . . . . . . . . . . . 91xvGlossaryADMET Absorption Distribution Metabolism ExcretionADT Androgen Deprivation TherapyAR Androgen ReceptorATP Adenosine TriphosphateBRDU BromodeoxyuridineCADD Computer Aided Drug DiscoveryCHO Chinese Hamster OvaryCPT CamptothecinCRPC Castration Resistant Prostate CancerCV Cross ValidationDAPI 4,6-diamidino-2-phenylindoleDNA Deoxyribonucleic AcidDMSO Dimethyl sulfoxideds Double StrandedEDTA Ethylenediaminetetraacetic acidFDA US Food and Drug Administrationxvif s femto-secondIC50 Concentration at 50% Cellular InhibitionIID Identical and Independently DistributedK KelvinKDNA Kinetoplast DNAKNN K Nearest NeighboursMD Molecular DynamicsMIA Monoterpenoid Indole AlkaloidsML Machine LearningMLL Mixed Lineage LeukemiaMOE Molecular Operating EnvironmentMTS 3-(4,5-dimethylthiazol-2-yl)-5-(3-carboxymethoxyphenyl)-2-(4-sulfophenyl)-2H-tetrazoliumMTT 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenylte- trazolium bromidens nano-secondNB Naive BayespTsOH p- toluenesulfonic acidPCa Prostate CancerPDB Protein Data BankPIN Pre Neoplastic FociPSA Prostate Specific AntigenPS Pictet - Spenglerxviips pico-secondQSAR Quantitative Structure Activity RelationshipQSPR Quantitative Structure Property RelationshipRBF Radial Basis FunctionRMSD Root Mean Squared DeviationRF Random ForestsSAR Structure Activity RelationshipSDS Sodium dodecyl sulfateSTR1 Strictosidine SynthaseSVM Support Vector Machinet-AML Acute Myelocytic LeukemiatBuOK potassium tert-butoxideTOP TopoisomeraseTHA 3,14,18,19-Tetrahydroangustinewt wild typeZINC ZINC Is Not CommercialxviiiAcknowledgmentsI would like to acknowledge my principal thesis supervisor Artem Cherkasov forhis help with the in-silico work, and my co-supervisor Xuesen Dong for his helpin guiding the project, and pushing it forward with wet-lab experiments. I wouldlike to thank my thesis committee who have been very accommodating towards mydeadlines. I would also like to acknowledge my family, and last but not least AllaPryyma, for always being there for me and proof reading my writing.xixChapter 1Introduction Part 1:Topoisomerases and Cancer1.1 TopoisomerasesDuring cell division DNA has to replicate rapidly at an average speed of 2 kilobasesper minute, this imposes strict requirements on DNA topology where supercoiledsecondary structures are highly unfavorable [7]. The role of topoisomerase in cellsis to take care of the aforementioned processes, relieving supercoiling in DNA forits further manipulation. A DNA supercoil is a tertiary topological structure whichDNA forms in-vivo and can be coiled in the positive or negative direction. DNApositive supercoiling, or over-winding, happens when the replication complex pullsopen the DNA strands, forcing the DNA downstream to tighten . Over-windingcauses DNA to be inaccessible by proteins, and so it is maintained in-vivo in anunder-wound, or negatively supercoiled state. Mammalian cells express six topoi-somerase genes: two Topoisomerase (TOP) I, (TOP I and TOP I mitochondrial), twoTOP II (TOP II a and TOP II b ), and finally two TOP III (TOP III a and b ) [8].Topoisomerases all relax supercoiled DNA, however each has their preference forthe type of DNA topology which they relax, and this varies as a function of thecell cycle (Table 1.1). TOP II b is constitutively active, taking care of general DNAmaintenance which is clear based on it its ability to relax positively and negativelysupercoiled DNA (Table 1.1) [8]. However TOP II a is only active during mitosis1Table 1.1: Topological Preferences of Topoisomerase enzymes.Enzyme Positive NegativeTop I X XTop II a X XTop II b X XTop III a X XTop III b X Xand DNA replication, likely because it relaxes positively supercoiled DNA. All TOPenzymes relax DNA by the formation of a covalent complex with DNA [8]. Gener-ally, a nucleophilic tyrosine will attack the electrophilic DNA phosphodiester bond,esterifying and causing detachment of the DNA backbone from itself and into theprotein tyrosine [9]. However, while TOP I makes one single stranded break at the3’ end, TOP II make a double stranded break at the 5’ end. After nicking, TOP Iallows for a controlled rotation of the DNA around the backbone in an AdenosineTriphosphate (ATP) independent way [9]. On the other hand TOP II passes a Dou-ble Stranded (ds) DNA strand known as the transported or T-segment through theds DNA break called the gate or G-segment which is done in an ATP and Mg2+dependent way [5]. Removing positive supercoils is necessary for replication andtranscription progression, otherwise too much supercoiled energy is put into theDNA such that it cannot be melted by a helicase. As topoisomerases make breaksin DNA in order to overcome supercoiling, they are in a particularly precarioussituation in that if they cannot execute their task perfectly, ds DNA breaks persist.This may happen if a small molecule stabilizes the cleaved complex, precludingthe topoisomerase from being able to seal back up the DNA.1.2 Prostate CancerProstate Cancer (PCa) is the most common cancer in men, and second to melanomaas the leading cause of cancer related death in elderly males particularly in westernsocieties [10] [11]. On average 24,000 Canadian men will be diagnosed with PCaand of that around 5000 will die [12]. PCa is a disease in a walnut sized glandcalled the prostate. The prostate is made up of three distinct morphological zones:2the peripheral zone, the transition zone, and the central zone. Benign prostatic hy-perplasia, a non-malignant growth, is found in the transition zone, while PCa ismainly found in the peripheral zone [13]. The prostate’s main function is to secreteseminal fluid and Prostate Specific Antigen (PSA) both of which help sperm per-sist in the vagina. In addition, it is also used to filter out toxins, and help regulatethe production of testosterone. As the prostate cells, specifally the luminal cells,express the androgen receptor they have a particularly high affinity for hormonessuch as testosterone, and so it is not surprising that this gland is a hot-bed for can-cer growth [14],[15]. When confined to the prostatic capsule, prostate carcinomais mostly curable by surgery or radiation therapy. However when it is metastatic, itmust be treated with a more radical approach such as Androgen Deprivation Ther-apy (ADT), which most often results in recurrence with a more aggressive cancer,known as Castration Resistant Prostate Cancer (CRPC). The etiology of prostatecancer is challenging to study as it is a very heterogeneous disease, where a singletissue will have a polyclonal genotype. However there are some events that seem tooccur frequently enough to be involved in the development of PCa. Preneoplasticfoci Pre Neoplastic Foci (PIN) are always a precursor of PCa, however PIN is alsopresent in healthy individuals at least transiently, so it might not be an absolutepredictor of PCa [16]. In 80% of PCa tumors an early event in disease progres-sion is the loss of chromosome 8q as shown by fluorescence in-situ hybridizationstudies [17] and it is believe that NKX3.1, a gene located on this chromosomewhose expression is restricted to prostate cancer cells, plays a large role in diseaseprogression [18].31.2.1 Prostate Cancer DiagnosisTable 1.2: Gleason Score and its corresponding risk based on the NationalComprehensive Cancer Network Risk Stratification Criterion.Risk Gleason ScoreVery low Gleason score  6, PSA  10 ng/mL,less than 3 biopsies with 50% can-cer, and PSA density  0.15 ng/mL/gLow Gleason score  6, and PSA 10ng/mLIntermediate Gleason score of 7 or PSA 10-20ng/mlHigh Gleason scores of 8-10 or PSA  20ng/mLVery High greater than 4 biopsies with Gleasonscores of 8-10Biopsy of the tumour is necessary to grade it, which gives clinicians an idea ofits progression and prognosis. The most popular grading scheme is through theGleason score. A Gleason score is assigned after examining histological slides andobserving the particular microscopic formations the cells make. The score rangesfrom 1-5 and consists of two numbers, one is a primary grade, given to cells withthe largest area of the tumour, and the second number is for the second largest cellgroup, where a larger number means the tissue is more cancer like, for example ascore of 3+2 is used to represent low or very low risk (Table 1.2) [19],[20]. In addi-tion serum PSA is also used to help diagnosis. Additionally tumors may be stagedbased on the TNM system Table 1.4. New imaging technology such as diffusion-weighted, dynamic contrast-enhanced magnetic resonance imaging demonstrateshigh specificity and sensitivity especially when combined with ultrasound [21].4Table 1.4: The AJCC TNM system with Gleason Score [4].AJCC Stage Stage Grouping Gleason Score and PSAI cT1, N0, M0 GradeGroup 1Gleason score  6, PSA 10I cT2a, N0, M0 GradeGroup 1Gleason score  6, PSA 10I pT2, N0, M0 GradeGroup 1Gleason score  6, PSA 10IIA cT1, N0, M0 GradeGroup 1Gleason score  6, 20 PSA  10IIA cT2b or cT2c, N0, M0Grade Group 1Gleason score  6, 20 PSAIIB T1 or T2, N0, M0 GradeGroup 2Gleason score 3 + 4 =7,20  PSAIIC T1 or T2, N0, M0 GradeGroup 3 or 4Gleason score 4+3 = 7 or8, 20  PSAIIIA T1 or T2, N0, M0 GradeGroup 1 to 4Gleason score  8, 20 PSAIIIB T4 or T3, N0, M0 GradeGroup 1 to 4Gleason score  8, AnyPSAIIIC Any T, N0, M0 GradeGroup 5Gleason score 9 or 10,Any PSAIVA Any T, N1, M0 AnyGrade GroupAny PSAIVB Any T, Any N, M1 AnyGrade Group1.2.2 CRPCAs previously mentioned, PCa may metastasize and progress into a resistant form.Although ADT is a well known treatment, it often has a poor outcome and so thereis lots of interest in augmenting current treatments, or even finding treatment al-ternatives [22]. For example it has been found that pre-treatment with docetaxelor cabazitaxel before ADT is able to further slow the progression of CRPC [21].After CRPC, there are few options available for treatment, however some modern5drugs exist such as abiraterone and ezalutamide which inhibit androgen biosyn-thesis and androgen receptor signaling respectively. CRPC progression is not fullyunderstood, but there seem to be some known causal factors. One way in whichCRPC may occur is through androgen hypersensitivity. After androgen ablationtherapy, very low levels of androgens may still exist, so these cells have to becomehyper- sensitive. One way in which this could happen is through the AndrogenReceptor (AR) amplification, or alternatively AR may just become hyperactive dueto mutations. Another possible mechanism is through promiscuity, where AR be-comes mutated and recognizes non-androgens as agonists. Thirdly there is theoutlaw pathway, where steroid hormone receptors may be activated by ligand in-dependent mechanisms. Finally there is the bypass pathway, where AR activationmay be completely by-passed for cellular proliferation [23].1.3 Topoisomerases and their involvement in Cancer1.3.1 Topoisomerase II and Prostate CancerTOP II b has been shown to be responsible for dsDNA breaks on both estrogen tar-get genes in breast cancer cell lines [24] and AR target genes [25] in prostate cancercell lines. It is well known that genomic rearrangements are some of the hallmarksof cancer. A number of rearrangements involved in prostate cancer occur on ARregulated genes. In nearly 50% of prostate cancer instances, the TMPRSS2–ERGfusion gene is present. Furthermore, Yegnasubramanian and colleagues [25] wereable to show that upon androgen stimulation, AR is co-recruited with TOP II b atregulatory regions of AR target genes, causing ds DNA cleavage events. Addition-ally, these cleavage complexes were shown to be highly recombinogenic, leadingto de-novo TMPRSS2-ERG fusion transcripts in a TOP II b dependent manner.Finally, it was shown that Etoposide treatment increased the amount of stabilizedcleavage complexes leading to more TMPRSS2-ERG fusions, while the catalyticinhibitor merbarone and TOP II b siRNA decreased these gene fusion events[25].Furthermore Li and colleagues [26] showed that TOP II b is highly expressed inPC cell lines, and the catalytic inhibitor ICRF was able to inhibit xenograft tumorgrowth in CRPC models, with lower efficacy in pre-CRPC models. It is not a suprise6Table 1.6: Important Small Molecule which target Eukaryotic Topoiso-merase. 1: ICRF-187 was approved for it’s ability to chelate iron.Compound Poison Inhibitor FDA approval Top II or Top IEtoposide X X X Top IIDoxorubicin X X X Top IIAmasacrine X X X Top IITeniposide X X X Top IIGenistein X X X Top IIIrinotecan X X X Top ITopotecan X X X Top IDiflomotecan X phase II X Top IICRF X X X, For cardio-protection1 Top IIAclarubicin X X X Top IISuramin X X X Top IIMitoxantrone X X X Top IIMerbarone X X X Top IIBNS 22 X X X Top IIthat given Topoisomerases role in cell proliferation that it is involved in cancer. Fora cancer to replicate quickly it must express high TOP II levels to constant decate-nate the replicated DNA. Indeed it has been shown in breast cancer patient tissueextract that high TOP II expression is correlated with a more aggressive form ofcancer[27].1.3.2 Topoisomerase II Targeting TherapeuticsTopoisomerases are well-known cancer therapeutic targets. Indeed, etoposide wasintroduced into the clinic in 1971 and remains one of the most popular chemothera-peutics for a large variety of solid and hematological tumors. Other popular Topoi-somerase targeting US Food and Drug Administration (FDA) approved drugs aredoxorubicin, amasacrine, camptothecin, teniposide, genistein and the fluoroquino-line drug family which differs from the other molecules in that it targets bacte-rial Topoisomerase. These molecules are highly anti-proliferative, however theiractivity is a double-edged sword as most of these molecules fall under the classof topoisomerase inhibitors known as poisons which are known to be toxic (Ta-7ble 1.6). Poisons stabilize ds DNA breaks by binding to the TOP II DNA covalentcomplex, leading to cancer cell death. Unfortunately, poisons are also associatedwith harsh side effects, and in some cases even secondary leukemia such as AcuteMyelocytic Leukemia (t-AML) [28]. Etoposide induced t-AML is likely caused byTOP II mediated cleavage of the Mixed Lineage Leukemia (MLL) and its translo-cation with other genes. Indeed the MLL gene is cleaved at A-T rich sequenceswith Alumotifs, a recognition site of TOP II [29]. Unfortunately the majority of themost popular chemotherapeutics have toxic effects. As there are multiple Topoiso-merases active in the cell, it is natural to ask wether one should be inhibited overanother. TOP II b is constitutively active, however cancer cells may depend moreon TOP II a which is used principally for cell replication. Furthermore it is believepossible that the off-target toxic affects of poisons may be in part due to targetingof the b isoform in all cell types. A landmark study performed by Azarova and col-leagues [30] who took mice with a TOP II b skin knockout, and treated them withmutagens to induce melanoma, found that it could be treated with etoposide in theknockout mouse, while in wild type (wt) mouse the treatment led to an increase inthe incidence of these tumors. Furthermore, it was postulated that TOP II b leads toan increased incidence of ds DNA breaks because of proteasome directed degrada-tion. In support of this it was found that treatment of the wt mice with a proteasomeinhibitor in conjunction with Etoposide led to a decrease in DNA damage. Anotherclass of TOP inhibitors are known as catalytic inhibitors. These do not stabilizeds DNA breaks, yet still inhibit TOP II and are therefore highly valued, unfortu-nately they have had poor translation to the clinic [31]. Indeed it is well knownthat TOP II inhibitors generally have potent in-vitro inhibition with poor biologicaltranslation [5]. This could be due to a lack of cellular drug uptake, metabolic in-activation, or cytoplasmic drug sequestration. The Bis-dioxopiperazine family ofmolecules (ICRF) are catalytic inhibitors of TOP II, they exert their activity throughbinding to the ATPase dimerization interface of TOP II at the GHKL domain in thenucleotide bound state [32]. Although it is a potent catalytic inhibitor, ICRF isknown to be metabolized in-vivo into an iron-chelator. It is not necessary that acatalytic inhibitor bind to TOP II, aclarubicin acts as an inhibitor by intercalatingDNA, and preventing the binding of TOP II. This type of catalytic inhibitor is un-desireable because it may interfere with other proteins that manipulate DNA. In8addition it does cause DNA damage through a non-TOP pathway [5]. Catalyticinhibitors have been found to prevent cell division, by causing chromosome mis-alignment and mis-segregation. This results in the swelling of the nucleosome asit is unable to divide due to check point inhibitors which are active from the lackof available TOP II [5][33]. There is also evidence that the potency of catalyticinhibitors is not just due to the inhibition of mitosis, but also may interfere withtranscription or other DNA metabolic processes. Evidence for this is due to theobservation that G1 arrested yeast cells over-expressing TOP II a do not preventapoptosis induced by ICRF-193 [34]. Inhibitors are believed to stall the cell cy-cle at the decatenation checkpoint [35]. This checkpoint was initially describedin the examination of the mechanism of action of ICRF-193 and is activated uponinduction of cell cycle arrest in G2. This checkpoint differs from the DNA damagecheckpoint in that it is independent of ataxia-telangiectasia, and does not involvephosphorylation of the hChk1 or hChk2 kinases. However, ICRF does not activateall the same checkpoints between different cell lines, for example the metaphasecheckpoint is activated in Hela cells, but not Chinese Hamster Ovary (CHO) cells[36][37]. Currently ICRF-187 is an approved drug due to its ability to protect fromthe cardiotoxicity caused by anthracyclines (for example doxorubicin). This tox-icity is due to anthracyclines complexing with iron, leading to hydroxyl radicals.While ICRF-187 is hydrolyzed to ADR-925, an EDTA analogue which chelateseither free iron or anthracycline bound iron [38].We hypothesized that a catalytic inhibitor of TOP II, which targets a novelpocket not susceptible to problems of currently used catalytic inhibitors, wouldbe a preferable alternative to topoisomerase poisons in the treatment of TOP II de-pendent cancers such as leukemia, lung cancer, and prostate cancer.1.3.3 Top I and CancerFigure 1.2: Camptothecin hemiacetal based tautomerization.9Figure 1.1: Catalytic Cycle of Topoisomerase II. The steps at which Top IItargets affect the cycle are shown. This figure was adapted from [5]Camptothecins, discovered by Monroe Wall and co-workers from the Chinese treeCamptotheca acuminatawas found to kill tumor cells by poisoning TopoisomeraseI [39]. There are two clinically used TOP I inhibitors, irinotecan and topotecanwhich are primarily used for the treatment of ovarian and small cell lung can-cers. Unfortunately camptothecins are also not without their side effects as theycan cause hematological toxicity due to toxicity towards bone marrow progenatorcells, nausea, affinity towards infections, and more. Furthermore, camptothecinsare quickly biologically inactivated due to E-ring hydrolysis of the lactone to thecarboxylate through the hemi-acetal intermediate formed by the ring-chain tau-tomerism (Figure 1.2). However compounds have been discovered which over-come this such as homocamptotheicins which have a 7 membered lactone ring.[40] [41] [? ] [? ] [? ]10Chapter 2Introduction Part 2: Methodsand Tools2.1 Receptor Based Methods2.1.1 DockingThis is the most popular Receptor Based drug discovery method in use. Thereexist both open source, and proprietary docking programs. The most popular opensource package is Autodock [42]. However licensed proprietary software such asGold and Glide are known to be far superior, and so Glide has been used in thework therein [43]. A docking Algorithm can be coarsely broken into two steps.1. Ligand-Receptor Conformational Discovery2. ScoringLigand-Receptor Conformational Discovery is the process of searching throughfeasible ligand conformations, while scoring is used to rank these conformationsto find the most likely candidate. Step 1 is generally completed with some vari-ant of Markov Chain Monte-Carlo sampling in combination with some heuristicthat directs conformational sampling into a physically reasonable space. Step 2is a thorough evaluation of the favour-ability of that configuration. It is meant to11model an energy function, but since the true quantum mechanical energy func-tion is inaccessible, a hand-tuned empirical scoring function is used. This scoringis done by assigning ’atom-types’ to each atom in a molecule. There are usuallymore atom-types than there are atoms because atoms in different chemical contextshave different behaviors, for example a carbon with four bonds, also known an sp3carbon, which behaves very different than a carbon in a benzene ring, known assp2. Classifying an atom with a given atom-type characterizes its partial charge,atomic radius, number of hydrogen bond donors/acceptors, and other characteris-tics heuristically deemed important. To speed things up, the interactions betweena given ligand atom-type and receptor atom-type is never calculated in situ, but ispre-calculated before hand and then retrieved by hashing during docking. This isdone by generating a three dimensional interaction grid between the receptor andthe ligand for every atom-type. Each point in these grids represent the interactionpotential between the probe atom-type, and the atoms in the receptor of the macro-molecule [44]. The potential functions characterize the energy of the atom-typesin the receptor at any given point in the grid, for example:El j = A/d12B/d6 (2.1)Is the common 12-6 Leonard-Jones potential. Where A and B are interaction spe-cific constants, the ’12’ term describes Pauli-exclusion based repulsion betweenelectrons being too close at the same energy levels, and the ’6’ term is attractive atlong ranges which describes the Van-der-Waals dispersion force. Another exampleof a potential function is the electrostatic potential, which is based off of the partialcharge of the interacting atom-types. To get the full potential at a single grid point,these potentials are simply added together:Exyz = El j+Eelectrostatic+Ehbond + ... (2.2)Often these are not faithful energy functions and instead involve heuristics whichmake them into scoring functions. The most favourable conformations, or oneswith lowest energy or higher score, are then kept and used as the docking pose. Inthis thesis the docking programs Glide and E-hits are extensively used due to their12different, but complementary docking schemes.Briefly, Glide works in four stages (with quite a few sub-stages).1. Site-point search2a. Diameter test2b. Subset test2c. Greedy score2d. Refinement3. Grid Minimization and Monte-Carlo4. Final ScoringAs a pre-processing step, Glide generates grids as previously mentioned. Addi-tionally Glide carries out an exhaustive ligand conformational search, eliminatingunlikely conformations based on heuristics. For each conformation glide exhaus-tively searches receptor-ligand conformations. This is done with site-points, whichare a grid of points around the receptor. A histogram is made by binning up thedistances between these points and the receptor, then something similar is donefor the ligand, histograms are made from the center of the ligand, to the surface.As the site-points on the receptor can be thought of as representing an optimallydocked-ligand, the histograms are compared between receptors site-points and lig-and site-points, with the ones having the most overlap are then used as the candi-date docking site. This search space is not as large as the original conformationalsearch space, and can be searched exhaustively. The Diameter test is then usedto evaluate the placement of atoms that lie within a pre-specified distance of theligand-diameter, which is used to orient the ligand diameter, if many steric clashesare found the pose is rejected. The Subset test consists of rotating the ligandabout its diameter and looking for hydrogen-bonding and other interactions theseare scored, if this is good, Greedy scoring is done where all the interactions withthe receptor are scored and the best are kept. After this Refinement is done ofthe ligand conformation and it is re-scored. Finally minimization with the OPLS-AA force field is done with the selected conformations and additional Monte-Carlo13moves are sampled to explore torsional angles. The last step consists of scoringthese final conformations with Glides proprietary scoring function.DGbind =Clipolipo f (rlr)+ChbondneutneutÂg(Dr)h(Da)+ChbondneutchargedÂg(Dr)h(Da)+ChbondchargedchargedÂg(Dr)h(Da)+Cmaxmetalion f (rlm)+CrotbHrotb+CpolarphobVpolarphob+CcoulEcoul+CvdwEvdw+Solvation terms (2.3)where h, f ,g 2 [0,1] are functions that characterize the difference between theoptimal distance or angle for an interaction. For example g(Dr) is 1.00 if the H-Xhydrogen bond distance is within 0.25 A˚ of the optimal bond length, and decreaseslinearly as the distance gets farther. Similarly h(Da) characterizes the differencebetween the optimal angle for some interaction, and the angle in question. Alsor, l,m stand for receptor, ligand, and metal respectively so  f (rlm) means the sumover all ligand-metal atoms [45].E-hits works, by making different assumptions about the ligand-receptor in-teraction geometry which allows an exhaustive enumeration over docking con-formations to try to find the global optimum. The ligand is first split into rigidstructures, which are then made into polyhedra, the receptor is also made into con-cave polyhedra. The vertices of these polyhedra represent chemical features (likethe atom-types explained above). Exhaustive matching of the ligand fragmentsfit to the receptor fragments is then done by fitting the polyhedra, with only rea-sonable fits kept. Then to link the molecule back together again, the independentfragments need to be re-connected without perturbing their docked poses, a hyper-graph clique detection algorithm is used for this [46]. Simple but fast scoring func-tions are used throughout this process to narrow down conformation candidates,and at the end an empirical scoring function is used to find the most likely posewhich is similar to that shown for Glide, but simpler [47].142.1.2 Molecular Dynamics SimulationsMolecular Dynamics TheoryMolecular Dynamics (MD) is similar to docking in that quite often a ligand-receptorbased system is modelled and used to make qualitative conclusions about the favour-ability of a ligand in its receptor. However these two methods differ in two mainways. Firstly MDmethods evolve a system as a function of time according to somerecursive mapping:Xt+1 = f (Xt) (2.4)MD is useful to learn about a system down to its atomic detail. If the processof interest is on the time-scale one is able to simulate in MD, then it can be avery informative tool to learn about a system. However it is not scalable in theway that docking is, bringing us to the second main difference between dockingand MD, scalability. As molecular dynamics propagates the system in time, thesmallest time step that can be taken is at the vibration frequency of the fastestmoving atoms, hydrogen. Hydrogen vibrates at a frequency with a period around1 femto-second ( f s), limiting our time step to 1 f s if we want to model dynamicsaccurately. This doesn’t exist in docking as nothing evolves over time. Thereforewe are limited to studying only a hand full of systems with MD, where we can studymillions in a relatively short period of time by docking.To run an MD simulation, we need two components, the first is a force-field, andthe second an integration algorithm. MD is a classical physics simulation, based offof Newtonian mechanics where we calculate force with Newton’s second law:F = ma (2.5)Where a 2 R3. The force-field is really just the set of equations used to calculatethe energy of a system, this is very much like the empirical scoring function used indocking Equation 2.2, indeed the concept of atom-types mentioned earlier is used15in the same way here. A typical force field might look something like this:U = Âbonds12kb(r r0)2+ Âangles12(q q0)+ ÂtorsionsVn2[1+ cos(nf d )]+ÂimproperVimp+ÂLJ4ei j((si jri j)12 (si jri j)6)+Âeleqiq jri j(2.6)Here r are radii, q and d are angles k,e,s are constants that are fit empirically andei j,si j,ri j are interaction parameters for atom i and atom j which are taken fromthe atom type, finally qi is the partial charge for atom i. Using the work-energytheorem we can write:∂U∂ ri= F (2.7)Which connects our energy calculation to the integrating algorithm through ac-celeration, which will be discussed next. The integration algorithm tells us howto propagate our system after each time step with Newton’s equations. A simpleexample of how we might propagate our system can be demonstrated when accel-eration is constant.a=dvdt(2.8)v= adt+ v0 (2.9)v=dxdt(2.10)x= vdt+ x0 (2.11)x=12at2+ v0t+ x0 (2.12)(2.13)Because a =  1m dUdr , all we need to get the trajectory is some initial velocity andposition. A more realistic integration algorithm called the Verlet algorithm is usedin practise, and is shown below Equation 2.19 [48]. These are all derived by takinga taylor series expansion of some function like acceleration, position, etc of our16current time step to our next time step.r(t+d t) = r(t)+ v(t)d t+ 12a(t)d t2+O(3) (2.14)r(td t) = r(t) v(t)d t 12a(t)d t2+O(3) (2.15)v(t+d t) = v(t)+a(t)d t+ 12b(t)d t2+O(3) (2.16)a(t+d t) = a(t)+b(t)d t+O(2) (2.17)(2.18)Truncating higher order terms and summing the position equations we get:r(t+d t) = 2r(t) r(td t)+a(t)d t2 (2.19)Writing the integration in this way allows us to subtract out errors from truncatinghigher order terms, more precisely every odd higher order term is subtracted out.The Verlet algorithm uses positions and accelerations at time t, and positions attime td t.Molecular dynamics can be used to assay the free energy of interaction betweenthe ligand and receptor, however in this thesis it is just used to get a qualitativeunderstanding of the interaction between the ligand and protein.Setting up an MD runAssuming there is a crystal structure available to us, the first thing that must bedone is to fix all the artefacts left by the crystallographic resolution of the struc-ture. Firstly, there are often atoms or groups of atoms unresolved. These needto be added in manually, for this purpose MOE [49] was used intensively. Sec-ondly, all the hydrogens must be fixed because x-ray crystallography only resolvesatoms with electrons due to the interaction of light with electrons. To protonate allatoms which are missing hydrogen at pH 7.4, the Propka optimizer has been used[50]. Finally a quick minimization of the full structure is done to get rid of anybad contacts in the crystal structure. For this either the Amber software package[51] is used, by setting the imin option to 1, or the Schro¨dinger software package17is used by running the OPLS3 structure minimization. To get information from anMD simulation, we must run the dynamics under conditions similar to what mightbe seen in a lab, this is taken as 300 Kelvin (K) and 1.01 Pa. However the crys-tal structure is not at that temperature as it does not have momentum informationassociated with the atoms. Therefore we must heat up the system first, and thenequilibrate the system to constant pressure, and finally we can start running inter-esting dynamics. Heating the system up consists of a temperature ramp from 0-300K. Instabilities may be caused if either of these steps are done too quickly whichmay mess up the dynamics in a way that is not easily recoverable. To keep thetemperature constant a thermostat is used, which is an algorithm that modifies thedynamics of the system such that a certain temperature is maintained on average.In simulations done with the Schro¨dinger software suite this is a Nose´ -Hooverthermostat, but in Amber the langevin thermostat is used with a coupling constantof 1 pico-second (ps) [52],[53]. After our system is heated up we must equilibratethe pressure to sample from the NPT ensemble (constant pressure, temperature,and particles dynamics), this is done with a barostat, for which Berendsen’s algo-rithm is used with 1 ps coupling [54]. One more note which must be mentioned isthe case when the system is quite large, and not possible to simulate in its entirety.For example in the pathological case a protein may be skinny but long, howeverits length will dictate the minimum edge length of the box in which the simulationis done, so the protein does not interact with itself due to the periodic boundaryconditions used in the simulation. In order to replicate the dynamics of a realisticsystem, this box must be filled with water molecules, as each atom needs positionand momenta to be kept track of, this is 6N coordinates, and quite often there aremore water atoms than protein atoms therefore the bulk of the computation may bededicated to moving around irrelevant water molecules, which is quite wasteful!Luckily there is a solution to this, one may use a constant di-electric in place ofthe water which is called an implicit solvent model, one of the most common al-gorithms is known as Generalized Born (GB). This is essentially a fast method forcalculating interactions with this constant di-electric, which formally is an approx-imation to the Poisson-Boltzmann equation [55]. Here constant pressure cannot beused, as there are large empty spaces in the simulation box, although this is notone of the major sources of approximation in these models as their dynamics still18mimic molecules at infinite dilution quite well.2.2 Ligand Based MethodsIn general, ligand based methods are only available once there are known ligandcandidates. Many ligand based methods can be made into receptor based methodsby adding receptor information in some form. This can be easily done in instancesof machine learning, however machine learning will be discussed in the context ofpure ligand based methods as it is used here in this form.2.2.1 Ligand Similarity Based MethodsOnce we have found a candidate molecule, we would like to get an idea for theSAR between the hit and the active site, so we would like to find more moleculessimilar to what we have already found. At this early stage of drug discovery weare also interested in finding more active compounds by searching through largedatabases. The most general way of doing this is with a distance metric betweentwo molecules, the most popular being intersection over union or tanimoto dis-tance:A\BA[B (2.20)Where A and B are two molecules which we query for similarity. A and Bmust be calculated from a molecule, and most frequently fingerprints are used.This similarity metric may be used when the molecular representation follows thedefinition of a set, count based or binary features are examples of this. Finger-prints are a representation of the molecule, and can be of an arbitrary number ofdimensions depending on how much information one wishes to include. In thisthesis only 1D fingerprints are used in the form of bit vectors, as these are wellestablished for usage in machine learning. One popular way of producing thesebit vectors is with ECFP fingerprints or Morgan-Fingerprints (implementation inRDKIT). This algorithm assigns n-bit integer arrays (where n is often is a binarynumber) to a molecule, which is used to represent it, and can be made unique for19Figure 2.1: The ECFP fingerprint hashing process. Benzoic acid amide isshow with hashing at various iterations for atom 1. Atoms denoted with”A” have not been included in the hash yet [6].each molecule. Briefly, the algorithm starts out by assigning atom-identifies oratom-invariants to each atom in the molecule, these are usually lists of numberscharacterizing the atom like sybyl atom-types, alogp atom codes and Daylight’satom invariants which are the number of neighbours (non-hydrogens), the valenceminus the number of hydrogens, the atomic mass, the atomic charge, the numberof attached hydrogens, and whether the atom is in a ring or not. This array is thenhashed to an n-bit integer. Following this is an iterative updating stage where eachidentifier is updated to reflect the identifiers of each atom’s neighbours. This isdone by forming an array for each atom, where each component in the array is atuple containing the bond order of the neighbouring atom(s), and its integer iden-tifier. This array is hashed again into an n-bit integer. As this process is repeatedfor each atom, the representation at each atom becomes a representation of thatatom and its neighbours. As the update is performed multiple times, each integerrepresents information from the molecule by the perspective of each atom, up to aradius of atoms n-updates away Figure 2.1. As some atoms will inevitably repre-sents the same integers as other atoms when their neighbourhoods up to a certainradius completely overlap, their identifiers will be the same and duplicates will beremoved [6].203DMolecular representations were also used to assess similarity more robustly.One method for this is known as ROCS ([56]). ROCS was designed by OpenEye to asses molecular similarity by taking into account both shape and chemicalfeatures. This is done by integrating molecular shapes to determine their overlap.Each atom is represented by multiple Gaussians, with each Gaussian representinga property like shape, hydrogen bond potential, electrostatic potential, etc. Then toasses similarity we calculate the squared difference between the shapes like so:S=Z(V1(x,y,z)V2(x,y,z)2dx dy dz (2.21)Where Vi is the volume of the molecule, calculated with:V = 1Z N’i(1 fi)dxdydz (2.22)Here fi is the Gaussian for atom i. The above equation Equation 2.21 calculatesthe similarity where S(V1,V1) = 0, the equation is quite intuitive as when expandedout we see that we take something like the volume of each atom, add them together,and then subtract something like two times the overlap volume between the twoshapes.2.2.2 Machine Learning in Drug DiscoveryThe use of some machine learning method in drug discovery, or Quantitative Struc-ture Activity Relationship (QSAR), dates back to at least 1937, when Hammet pub-lished his seminal work with the discovery of an empirical relationship betweenequilibrium or rate constants and chemical constituents. It was found that a con-stituent would affect another reaction in the same way it does for a reference re-action, modulo some reaction constant. This is known as the linear free energyrelationship [57]. They key to his work was to develop a linear equation that re-lated a thermodynamic or kinetic quantity, to some molecular property like thesubstituent type, or steric volume with and without the substituent. Although itwas only relevant to benzene, it was later extended to other types of molecules,and this relationship was used to help elucidate reaction mechanisms, and find dif-21ferent substiuents with desired effects. In its essence this is what modern QSARseeks to accomplish. The idea is to discover a relationship between easily calcu-lated properties of a molecule and a more complicated property such as bindingfree energy to a receptor. Machine learning seeks to do this by learning this rela-tionship in a data driven way. Machine learning can be broken up into supervisedand semi-supervised methods. Supervised methods seek to learn p(y|x) where yis the quantity of interest, and x is the easy to calculate molecular property. Onthe other hand, unsupervised methods generally attempt to learn p(x), i.e. the datagenerating process. Sometimes one might learn p(x) as a by product of learningp(y|x), one example of this is known as Naive-Bayes (as we will see, y and x areflipped in this scenario) which will be discussed in the sequel. Almost all machinelearning algorithms learn by optimizing a quantity called the log-likelihood. Givena set of data, and labels of N data points {xi,yi}Ni , the log-likelihood is written as:L (y|x,q) =Âilog(p(yi|xi,q)) (2.23)Where the data is assumed to be independent and identically distributed (IID). Hereq represents the model parameters. For complicated models this is not availableanalytically as the relationship of y with q and x is quite complicated and non-convex. The log-likelihood is used instead of the likelihood due to a few reasons: 1.The log makes the product into a sum, which decouples the gradient between eachdatum, allowing for efficient parallelization 2. The log will make a small number(between 0 and 1) larger, multiplying probabilities against each-other many timesover may lead to such small numbers which are rounded off to zero 3. There existsa connection between the log likelihood and information theory which has lead tomore accurate and useful machine learning algorithms. For example we do notactually optimize the likelihood directly, but we optimize the cross-entropy:Ep⇤(y|x)p⇤(x)[log(p(yi|xi,q)p(xi|f))] (2.24)where p⇤ is the true distribution, q is the parameters of our supervised model, andf the parameters of the prior. This quantifies the redundancy of using a codingscheme from our learned distribution for codes coming from our true distribution.22Next the machine-learning algorithms used in this thesis will be discussed.Decision TreesA decision tree algorithm is known as a high variance, low bias algorithm as it canfit to any dataset with minor assumptions. Part of its power comes from it beingnon-parametric. The decision tree algorithm works by recursively partitioning theinput space such that samples with the same labels are kept together. The algorithmcan be summarized as follows:1. Choose a feature dimension, and threshold on which to split on2. Calculate the impurity for the given splitG=nxthresholdNm⇤H(x threshold)+ nx>thresholdNm⇤H(x> threshold)(2.25)3. Find the threshold and feature that minimizes the impurityHere Nm is the total number of data points under consideration, nx the numberless than or equal to a given threshold for feature x, and H is some measure ofhow well the split decreased the classification error [58]. Some common functionsthat measure the quality of the split are Cross-Entropy, Misclassification error, andfinally the most popular is Gini Index [59]:H(X) =Âkpmk(1 pmk) (2.26)where pmk = 1Nm Âxi2Rm I(yi = k) or the empirical probability of a given class in aregion Rm. Here H = 0 if the split leads to a completely pure set of labels.Random ForestsRandom Forests (RF) is an ensemble method, that learns many decision trees andmakes predictions by averaging the output of each decision tree. To build theensemble of trees, the method of boot-strap aggregation is used on both the dataand the features. When building a decision tree, a subset of the features and a23subset of the data are sampled with replacement. This de-correlates the trees, andthus the average leads to a lower variance estimator [60].K-Nearest NeighboursK Nearest Neighbours (KNN) is another example of a high variance low bias al-gorithm as it can fit any training dataset perfectly. One of the assumptions thisalgorithm makes is that the label of a data point is locally constant, i.e. in a givenregion R around some data point x0 , where c is a constant then for the label y,y= c : 8x 2 R. KNN is a simple, non-parametric algorithm, although it can be quitecumbersome for large datasets as its naive implementation requires all the data tobe stored in RAM. There is really no training phase for the simplest version of thealgorithm, and the classification rule is simple [60]:P(y= j|X) = 1k Âxi2Nk, j(X)1{yi = j} (2.27)Where Nk, j(X) are the k closest neighbours to X with label j. The final compo-nent we need to describe KNN is a similarity metric. Although Euclidean distancep(x y) · (x y)T is intuitive in three dimensions, it breaks down in higher dimen-sions, this phenomena is referred to as the curse of dimensionality [61]. In practisewe must screen different values for the p-norm:(Âi(xi yi)p)1/p (2.28)Support Vector MachinesSupport Vector Machine (SVM) are more biased but lower variance estimators thanthose discussed earlier. SVMs are maximum margin classifiers, i.e. they find aclass seperating boundary with a maximum margin between the boundary and thesupport vectors. The support vectors are the data points closest to the hyperplane,and are needed in memory to classify a new query. The optimization goal is:maxW,b(W ⇤ x j+b)y j  1 (2.29)24Where W,b are the parameters, y are the true classes, and x the data. Here W,bdefine the decision boundary, and anything that is classified correctly above or atthe margin of 1, will have (W ⇤x j+b)y j  1. In this case the margin length is 2||W || .In practise, the parameters are learned using the following objective: [60]:maxa0Âja j 12Âi, jyiy jaia j(xi · x j) | Âja jy j = 0 (2.30)Note that the sum is over all training examples and a are Lagrange multipliers.This objective function is derived from a less opaque problem which is more obvi-ously related to maximizing the margin:minw,b12wT ·w | (w · x j+b)y j  1 (2.31)In the case of binary classification y 2 [1,1], and so (w · x j + b)y j is  1 if wehave the guessed the right class, and 1 otherwise. In this format we can easilysee the equation for the line, showing us the linear boundary which we are tryingto find.In this form, the problem is convex, and so the classification rule for an arbi-trary x is:y sign[Âiaiyi(xi · x)+b] (2.32)Here the sum is over only the support vectors xi. It should be noted that wecan generalize SVMs to non-linear decision boundaries by substituting (xi · x) forK(xi,x),also known as a kernel and can be thought of as a similarity metric betweenthe support vectors and the query vector. One of the most popular being the RadialBasis Function (RBF) kernel Equation 2.33.K(xi,x) = exp(kxi xk22s2) (2.33)The SVM that has just been described is known as the hard-margin SVM. How-ever we might have noise in our data, or it might just not fit the assumptions ofthe SVM. In this case we use what are called ’slack variables’ with a coefficient25Figure 2.2: The directed graphical model describing Naive Bayes.that allows us to trade off how large our margin is with how many data points weclassify correctly. The primal problem is then written as so:minw12w ·w+C ⇤Âixi (2.34)(w · x j+b)y j  1xi (2.35)s.t. xi  0 (2.36)Here xi are the slack variables, i.e. how much we permit the margin to be ignoredby each variable, andC is a hyper-parameter which we tune to change emphasis onthe slack variables, i.e. with a smaller C we allow more slack. In drug discovery,often we have to deal with imbalanced datasets, i.e. when there is an abundanceof one class over the other. In this case we may want to put more emphasis on ourpositive class then our negative class. To incorporate this information we multiplya weight byC for each of the variables we would like to re-weight.Naive BayesThe Naive Bayes (NB) algorithm define a directed graphical model (Figure 2.2)which describes a causal relationship between class p(Ck) and the features of datagiven the class p(x|Ck). With these two quantities defined, we can then do inferencein this graphical model with Bayes-Rule:p(Ck|x) = p(Ck)p(x|Ck)p(x) (2.37)Where the p(Ck) is the prior, top right the likelihood, and the denominator is knownas the evidence. On the left hand side we have the posterior, which is the class. Asp(x) =Rp(Ck)p(x|Ck)dCk it is intractable. Thus in general we ignore the denomi-26nator as it is independent of the class anyway:p(Ck|x) µ p(Ck)p(x|Ck) (2.38)Which allows us to only worry about the joint distribution. The approach is knownas Naive Bayes because we assume that the features are independent given theclass:p(x|Ck) =’ip(xi|Ck) (2.39)Then to classify we seek the maximum a-posteriori decision (MAP):ypred = argmaxkp(Ck)’ip(xi|Ck) (2.40)The class prior may be calculated empirically from the data or assumed to be uni-form. Here the parameters of the model are probabilities which we learn withmaximum likelihood [60].Root Mean Squared Deviation (RMSD)Throughout the work done for this thesis, molecules are compared to eachother,sometimes they are of the same type, other times they are not. The first case hasbeen handled in the above sections (Section 2.2.1), it is however the second casewhich still needs discussion. Fortunately this is not as ambiguous of a task as thefirst, we only need to measure the same molecule, but perhaps in a different posi-tion, and since these molecules live in 3 Dimensions, we can safely use a Euclideandistance metric without fretting over the curse of dimensionality. For this we useRMSD:RMSD=rÂi(x1i x2i)2N(2.41)Where x1i is atom i from molecule 1, and N the total number of atoms.27Algorithm evaluation metricsIn the work done in this thesis there is a need to determine how well an algorithm isperforming. The simplest measure for this is accuracy, although it may be intuitive,very often it is not that informative. Take for example a supervised anomaly detec-tion algorithm for bank fraud, most transactions will be non-fraudulent, howeveran almost vanishingly small amount of transactions will be fraudulent. If our algo-rithm just guesses the transaction is non-fraudulent it will be right most of the time,and its accuracy meaninglessly high. One solution to this is to include precision(true positive rate) and recall in our scoring.Precison: TPTP+FP(2.42)Recall: TPTP+FN(2.43)Where TP,FN,FP are true positives, false negatives, and false positives respec-tively. In the same scenario as above, the algorithm would have a score of 0 if itnever guessed the occurrence of fraud correctly. A measure that combines bothprecision and recall equally is known as the F1 score [62]:F1=11Recall +1Precision(2.44)where (2.45)F1 2 [0,1] (2.46)This is the harmonic mean of precision and recall together which is often used toaverage ratios and is  the arithmetic mean.Another choice which is possible is known asMatthews Correlation Coefficient(MCC). This method is useful when false negatives and false positives are equallyimportant. Whereas F1 is often used when there if an emphasis on true positives as28F1 is always zero if no true positives are assigned [63].MCC =TP⇤TNFP⇤FNp(TP+FP)(TP+FN)(TN+FP)(TN+FN)(2.47)where (2.48)MCC 2 [1,1] (2.49)Here an MCC of -1 means there is a perfect negative correlation with the outputof the algorithm and the true output, while 1 means the correlation is perfectlypositive [63]. Finally the Cohen’s kappa coefficient (k) is also made use of in thisthesis. This is a statistic meant to measure the agreement between two raters whoclassify N items into mutually exclusive classes .k = po pe1 pe (2.50)where (2.51)k 2 [1,1] (2.52)and (2.53)pe =1N2Âknk1nk2 (2.54)Here po is the relative agreement amongst raters or accuracy, and pe is the probabil-ity of agreement by chance, nk1 is the number of times rater 1 predicted categoryk. If the raters are in complete agreement k = 1, if agreement is completely bychance then k = 0 [64].K-fold Cross ValidationWhen looking for the right algorithm to use for ones data, often one runs the algo-rithm on some training set. However the score which the algorithm receives on thattraining set is sometimes not representative of its score on an unseen set of data, soit must be tested on data it has not seen. One example of this is when the trainingscore is much higher than the testing score, indicating over fitting. Furthermore ifone uses this unseen dataset, called a validation set, many times over to find thebest algorithm, parameters, or features, one might actually find themselves over-29fitting on the validation set, and so another dataset, the test set, needs to be used.However, the test set must only be evaluated after choosing an algorithm and mustnot be used to pick hyper-parameters or features. This is a reasonable set up if boththe test and validation set are large enough to get a good estimate of the score ofan algorithm. However in the case that a data set is small, one may not be ableto afford splitting their data three times. In this case K-fold Cross Validation (CV)may be used. Here the data is split up into K chunks, where K-1 chunks are trainedon, and 1 tested on. This is repeated K times for each chunk, and then K is aver-age over all instances. In this thesis K-fold cross validation is used in place of avalidation set.argmini{U(x1, ...,xn)}30Chapter 3Applied Drug Discovery:Topoisomerase I3.1 PrefaceThis section is a replication of the article published in ACS chemical biology [3].It should be noted that all the wet-lab work, i.e. the relaxation experiments, organicsynthesis, and cellular assays were done by Yunrui Cai, Huajian Zhu, and WenjunYu under the supervision of Hongbin Zou at Zheijang University.3.2 AbstractMonoterpenoid Indole Alkaloids (MIA)s comprise an important class of moleculesfor drug discovery, and they have variant carbon skeletons with prominent bioac-Figure 3.1: Chemo-enzymatic Synthesis of THA31tivities. For instance, in spite of limitations to their use, camptothecins are the onlyclinically approved TOP I inhibitors. The enzyme Strictosidine Synthase (STR1),which is key for MIA biosynthesis, was applied to the enantioselective prepara-tion of three N-substituted (S)-acTHA derivatives. These non-camptothecin MIAswere shown to have moderate in vitro HepG2 cytotoxicity and TOP I inhibitionactivities. The (S)-configured MIAs had stronger cytotoxicity and Top1 inhibitionthan their chemically synthesized (R)- enantiomers, which aligned with the re-sults of molecular dynamics simulations. A series of N-substituted (S)-3,14,18,19-Tetrahydroangustine (THA)s were then chemoenzymatically synthesized to inves-tigate structure activity relationships. The most active analogue observed was theN-(2- Cl benzoyl)-substituted derivative (7i). Insight into the binding mode of 7iand a TOP I-DNA covalent complex was investigated by molecular dynamics simu-lations, which will facilitate future efforts to optimize the TOP I inhibitory activitiesof non- camptothecin MIAs.3.3 IntroductionMIAs comprise an important group of natural secondary metabolites includingCamptothecin (CPT), ajmaline, vindoline, quinine, and others; (Figure 3.2), theyare famous for their diverse structural skeletons and prominent pharmacologicalactivities. [65],[66] STR1, (Figure 3.2) is a gateway enzyme that conducts asym-metric Pictet - Spengler (PS) condensation between tryptamine and secologaninto strictosidine for the biosynthesis of some 2000 MIAs.[67],[68] Owing to thecritical function of STR1 in the biosynthetic pathway of MIAs, it has been com-prehensively studied in chemical biology following the disclosure of its 3D struc-ture. [68],[69],[70],[71],[72],[73],[74] The PS condensation catalyzed by STR1 isvery attractive due to its advantages of high stereoselectivity and efficiency as wellas mild reaction conditions. However, there are few examples demonstrating thechemoenzymatic use of STR1 to synthesize new alkaloids in the search for bioac-tive substances.[69][75], Previously, we reported the preparation of several MIAsby one-step syntheses starting from the STR1 reaction product strictosidine.[76]Since CPT, a natural product MIA, was established as a TOP I inhibitor with excel-lent antitumor activity[77], TOP I has become a hot target in cancer chemotherapy32due to its role in DNA replication, transcription, and other processes through the re-laxation of supercoiled DNA. [78],[79] Although camptothecins such as topotecan,irinotecan, and belotecan are the only three clinically approved TOP I inhibitors[79], [80] they have several limitations to their use, specifically their poor chem-ical stability due to the spontaneous opening of the lactone.[78],[80] Therefore,discovery of novel non- CPT TOP I inhibitors has become a promising researchfield, and several novel TOP I inhibitors have been reported.[81],[82] These diversi-fied inhibitors include compounds in clinical trial such as indenoquinolines (BMS-247615, Phase II),[83] indenoisoquinolines (NSC-724998, Phase I),[84] and diben-zonaphthyridinones (Genz-644282, Phase I) [85] as well as some molecules underpreliminary investigations such as the evodiamines (Figure 3.3).[86], THA (Fig-ure 3.2) and angustine are biogenetically proposed to belong to STR1- catalyzedMIAs and show in vitro antiproliferative and anti- inflammatory activities.[87],[88]When first isolated from Strychnos angustiflora,[89] the carbon skeleton of angus-tine aroused significant interest from chemists to attempt its total synthesis, andseveral successful strategies were reported.[90],[91],[92] However, these strategiesinclude many steps and low yields or lack the flexibility required for facile analogueproduction. Notably, the enantiomer 3(R)- THA was also generated from hydro-genated vincosamide, and the selective 3R configuration originated from the spon-taneously lactamized product of vincoside, isolated from Adina rubescens.[93] In-terestingly, the structural arrangement of THA is similar to that of CPT and the evo-diamines (Figure 3.3A), which suggests that compounds with this skeleton mayalso be potential TOP I inhibitors. A Structure Activity Relationship (SAR) studyof evodiamines accordingly inspired the synthesis of N-acylated THA derivatives.In order to also identify which configuration might promote the desired bioactivity,3(S)-THAs (7a-c) were prepared by an STR1 chemoenzymatic approach, and theirR- enantiomers (8a-c) were chemically synthesized from vincosamide. Preliminaryinhibitory screening results showed that the 3(S)-THA derivatives (7a-c) were su-perior for both in vitro HepG2 cytotoxicity and TOP I inhibition compared to their(R)-enantiomer counterparts (8a-c), which aligned with the results of MD simula-tions. Initiated by immobilized STR1 as a biocatalyst, a series of 3(S)-THAs wasstereoselectively generated (Figure 3.3B) and tested in vitro for TOP I inhibitoryactivity and HepG2 cytotoxicity.33Figure 3.2: STR1 as a gateway enzyme for the biosynthesis of natural MIAs.3.4 Results and Discussion3.4.1 ChemistryThe synthesis of strictosidine can be achieved via the condensation of tryptamineand secologanin in water by reflux in the presence of acetic acid, leading to amixture of strictosidine and the spontaneously lactamized product of vincoside,vincosamide. Pure vincosamide can be obtained by filtration due to its lowerwater solubility, leaving a strictosidine mixture that is challenging to purify.[94]In the novel approach presented, strictosidine (1) was synthesized by the STR1-catalyzed enantioselective PS reaction of tryptamine and secologanin under mildreaction conditions and then converted into strictosidine lactam (2) in the presenceof sodium carbonate. The synthetic routes to N- substituted (S)-THA derivatives7a-r and their three R- enantiomers 8a-c are depicted in Figure 3.5. The hydro-genated product (3) of 2 was gently incubated with b -glucosidase, extracted fromalmonds, to give its aglycone (4), which reacted with ammonium acetate in reflux-ing ethanol to afford its E ring aza- product (5). Reaction of compound 5 with p-toluenesulfonic acid (p tsoh) in refluxing toluene led to dehydration and oxida-tion to provide the heteroaromatized (S)-THA (6). In the presence of potassium34tert-butoxide (tbuok), compound 6 was treated with acyl or alkyl halides to givethe N-substituted (S)-THA derivatives 7a-r. With vincosamide in hand, the samestrategy was applied to prepare the corresponding N-substituted (R)-3,14,18,19-THA derivatives (8a-c).3.4.2 Study of the Favorable Configuration of C-3In order to identify the favorable configuration of C-3, six N-substituted THAderivatives (7a-c, 8a-c) were synthesized and screened for in vitro TOP I inhibitoryactivity and antiproliferative activity using HepG2 cells. Many TOP I inhibitors actby stabilizing a covalent TOP I-DNA complex, a so-called “cleavable complex”,to prevent the relaxation of supercoiled DNA. To investigate their inhibitory ac-tivity, TOP I and supercoiled pBR322 DNA were incubated in the presence of thetest compounds or CPT. Inhibition was observed when the supercoiled complexeswere prevented from being relaxed. As shown in Figure 3.4A, compounds 7a-c and 8a-c are all active against TOP I- mediated relaxation of supercoiled DNAat a high concentration (500 µM), indicating that N-substituted THA derivativeswere novel non-CPT inhibitors of TOP I. At lower concentration (200 µM), 7a-cand 8a maintained inhibition, whereas none was observed for 8b and 8c. Mean-while, compound 7a was slightly more preventive of supercoiled DNA relaxationthan its enantiomer (8a). These results suggest that N-substituted THA deriva-tives with the S configuration possess better Top1 inhibitory activity than theirR- enantiomers. To further test the hypothesis of a preferred C-3 configuration,compounds 7a-c and 8a-c were also assayed for their in vitro cytotoxic activitiesagainst the HepG2 cell line using an 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenylte-trazolium bromide (MTT) assay. The results listed in Table 3.1 also showed thatN-substituted THA derivatives with the S configuration (7a-c) are each about 3-fold more potent than their R- enantiomers. In addition, compound 7a, having a4- chlorobenzoyl substituent, exhibited the most potent cytotoxic activity of theanalogues tested, which was consistent with the observed TOP I inhibitory activity.In order to rationalize the differences between the S- and R- enantiomers of theparent compounds (7a-c and 8a-c), molecular dynamics (MD) simulations wereperformed with the OPLS3 force field implemented in Maestro 2016-3 software35Figure 3.3: STR1 as a biocatalyst for potential Top1 inhibitor THAs.(Figure 3.6).[95],[45],[96], The docked structures of the S-enantiomer (7a) andR-enantiomer (8a) in Top1 (PDB ID: 1k4t) were simulated for 156 and 100 nano-second (ns), respectively, and visualized with the MOE software suite. After thefirst 60 ns of the simulation of compound 8a, most of the compound was observedto be pushed out of the DNA, leaving only part of the pyridine ring left between thebases. The system was simulated for another 100 ns to see if the compound wouldfind its way back between the nucleobases. This was not observed; however, aftera 100 ns simulation of compound 7a, an induced fit was observed with p-stackinginteractions that remained strong throughout the simulation (Figure 3.6B). To ac-commodate compound 7a, the pocket between the DNA increased vertically byaround 1 A˚ (Figure 3.7B, green). Furthermore, it can be seen that adding large sub-stituents on the R-enantiomer results in an even more unfavorable geometry due tothe steric interaction resulting from the large out-of-plane angle of the stereocenter(Figure 3.6C). Therefore, it appears that the lack of potency of the R- enantiomersof this set was largely correlated to the out-of- plane angle between the indole ringand the rest of the ring system. The S-enantiomer is much more planar, facingmore empty room in the pocket, allowing for free substitution on the indole nitro-gen (Figure 3.6D). This explains the greater flexibility for modification afforded tothe S-enantiomer, with the 4- chlorobenzoyl analogue exhibiting the most potentcytotoxic activity of the compounds tested.363.4.3 Structure-Activity RelationshipSubstituted benzoyl, acyl, cyano, and alkyl groups were introduced to prepare N-substituted (S)-THA derivatives 7d-r for SAR studies Figure 3.5. As shown in Fig-ure 3.4B, at a concentration of 200 µM, compounds 7a-c, 7h, and 7i were foundto be active against TOP I-mediated relaxation of supercoiled DNA, whereas com-pounds 7d, 7f, and 7o showed moderate TOP I inhibitory activity; the rest werealmost inactive. Overall, N- benzoyl compounds exhibited the most potent TOP Iinhibitory activity, especially those with a Cl-substituted benzoyl group (7a, 7h-i).This can be understood by the results of the simulation, where it is apparent thatthe benzoyl ring makes a close interaction with a methionine residue from TOP I(Figure 3.7C). Compound 7i was then selected to test its in vitro TOP I inhibitoryactivity at three different concentrations with 7a and CPT as controls (Figure 3.4C).Furthermore, the antiproliferative activity of 7a-r against the HepG2 cell line wasinvestigated using CPT as a reference drug. As shown in Table 3.2, compound 7iwas the most potent in-vitro antiproliferative agent against HepG2, with an Con-centration at 50%Cellular Inhibition (IC50) value of 1.8 µM.Overall, IC50 values ofmost N-benzoyl compounds and some N-acylated compounds were  10 µM, butmost N-alkylated compounds exhibited weak cytotoxicity. Taken together, the re-sults of the cytotoxicity assay are consistent with TOP inhibitory activity. To furtherunderstand the contribution of the novel THA pharmacophore toward TOP I inhibi-tion, the most active derivative, compound 7i, was compared with CPT by dockinginto the final snapshot of the 7a simulated structure and superimposing this withthe CPT crystal structure (Figure 3.7). The overlap of rings A, B, and C betweenthese molecules suggested that their binding modes were similar (Figure 3.7A),Table 3.2). Interestingly, both internal amides of the two compounds overlapped intheir nitrogen and carbonyl oxygen atoms in the same region, perhaps satisfying anelectrostatic requirement. In addition, compound 7i was observed to induce its fitinto the cleaved DNA complex with TOP by increasing the size of the gap betweenthe DNA bases relative to CPT (Figure 3.7B), in which 7i is stabilized by additionalamino acid contacts that predominantly include methionine. This interaction ap-pears to be driven by a strong aromatic-dipole interaction at the ortho- substitutedbenzoyl ring with methionine 428 (Figure 3.7C). Interestingly, during the simu-37Table 3.1: In Vitro Cytotoxic Activity of the THA Enantiomers againstHepG2Compound Cytotoxicity (IC50, µ M)7a 8.5± 0.067b 20.9± 0.097c 23.4± 0.17CPT 0.68± 0.018a 29.9± 0.198b 65.2± 0.248c 57.8± 0.11lation of compound 7a, a strong water-mediated contact with lysine 751 of TOPI was observed for more than half of the simulation (Supplementary InformationFigure A.7, Figure A.8 ).3.5 Methods3.5.1 Topoisomerase I Docking of MIAsThe 1k4t entry of TOP I was downloaded from the Protein Data Bank and preparedwith the Schrodinger Maestro protein preparation software module.[95] Briefly,all original hydrogens were removed and re-added, crystallographic ions and wa-ters were removed, and the Propka optimizer at pH 7.4 was used to calculate theamino acid protonation states. The structure was then minimized with CPT usingthe OPLS3 force field. Furthermore, the crystal structure was used to dock THAderivatives with the Glide module on XP mode and used as a starting point forthe simulation. The highest scoring structure was used to prepare the simulatedsystems.3.5.2 Topoisomerase I simulationsTheMD simulations described in Section 2.1.2 were done on various ligand-receptorcomplexes. In all cases the OPLS-3 force field was used implemented in theDesmond software suite [97]. In all cases the iso-thermal iso-baric ensemble38(NVP) were used as they correspond most closely to laboratory conditions at atemperature of 300 K and pressure of 1.01 bar. The Nose-Hoover chain thermostat[53] and the Martyna-Tobias-Klein isotropic barostat [54] were used. The SPCwater model was used (as it was optimized for OPLS3) and the protein was put ina orthorhombic box with a buffering distance of 10A˚. The system was neutralizedwith 18 Na+ ions, and 0.15M NaCl used to mimic physiological conditions. Thestarting coordinates for the simulation were taken from the docked structure3.5.3 Synthesis of N-Substituted THAsSynthetic methods are summarized in the legend of Figure Topoisomerase I Inhibition AssayEnzyme activity was measured by assessing the relaxation of supercoiled pBR322plasmid DNA. Test compounds were dissolved in Dimethyl sulfoxide (DMSO) andwere tested at final concentrations as shown. The reaction mixture contained 2xDNA TOP I buffer (10 µL), TOP I (0.5 U), the test compounds (various concentra-tions), pBR322 plasmid DNA (0.25 µg), and distilled water in a final volume of20 µL. Reactions were carried out for 30 min at 37 C and then stopped by addingSodium dodecyl sulfate (SDS) [0.5% (w/v) final concentration]. After that, 3.5 µLof 6x loading buffer [0.1 mM Ethylenediaminetetraacetic acid (EDTA), 7% (v/v)glycerol, 0.01% (w/v) xylene cyanol FF, and 0.01% (w/v) bromophenol blue] wasadded. Reaction products were electrophoresed on a 0.8% (w/v) agarose gel inTAE (Tris-acetate- EDTA) running buffer at 60 V for 1.5 h. To visualize the reac-tion products, the gel was stained with 0.5 µg mL1 ethidium bromide for 10 min.DNA bands were visualized using a UV transilluminator.3.5.5 In Vitro Cytotoxicity AssayCells were plated in 96-well microtiter plates at a density of 5 x 103 cells/well andincubated in a humidified atmosphere with 5% CO2 at 37 C for 24 h. Test com-pounds were added to wells at different concentrations, and 0.1% (v/v) DMSO wasused as a control. After samples were incubated for 72 h, 20 µL of MTT solution(5 mg mL1) was added to each well, and the plate was incubated for an additional39Figure 3.4: Top1-mediated DNA cleavage assay. (A) Inhibition of Top1 re-laxation activity at 200 and 500 µM. Lane 1, DNA alone; lane 2, DNA+ Top1; lanes 3-16, DNA + Top1 + CPT or test compounds (7a, 8a, 7b,8b, 7c, 8c). (B) Inhibition of Top1 relaxation activity at 200 µM. Lane1, DNA alone; lane 2, DNA + Top1; lanes 3-21, DNA + Top1 + CPT ortest compounds (7a-7r, respectively). (C) Inhibition of Top1 relaxationactivity at 50, 100, and 200 µM. Lane 1, DNA alone; lane 2, DNA +Top1; lanes 3-11, DNA + Top1 + CPT or test compounds (7a, 7i).4 h. The formazan was dissolved in 100 µL of DMSO. The absorbance (OD) wasquantified with a microplate spectrophotometer at 570 nm. Wells containing nodrugs were used as blanks for the spectrophotometer. The survival of the cells wasexpressed as a percentage relative to the untreated control wells. All experimentswere performed in triplicate.3.5.6 SAR of the S-enantiomerN-substituted (S)-THA derivatives were prepared with benzoyl, acyl,cyano, andalkyl derivatives Figure 3.1.40Figure 3.5: Synthesis of N-substituted THA derivatives. a Reagents and con-ditions: (a) STR1, KPi buffer, pH 7.0, 5 C; (b) 5% Na2CO3, 70 C; (c)H2, Pd, rt; (d) glucosidase, acetate buffer, pH 5.0, 37 C; (e) NH4OAc,EtOH, reflux; (f) p-TsOH, toluene, 90 C; (g) acyl or alkyl halides, t-BuOK, THF, rt; (h) acetic acid, water, reflux41Figure 3.6: MD simulations of 7a and 8a in Top I (PDB ID: 1k4t). (A) Threeevenly spaced snapshots taken from the simulations of 8a in the color-coded order of green, red, and purple. (B) Three evenly spaced snap-shots taken of the simulation with compound 7a, in the color-coded or-der of green, red, and purple. (C) Final snapshot of the simulations of8a. (D) Final snapshot of the simulation of 7a. In all of the pictures, theprotein and most of the DNA have been removed for clarity.42Figure 3.7: (A) Superimposed docked complexes of 7i (green) and CPT (pur-ple). (B) Comparison of the final snapshot of 7i (green) with CPT (pur-ple) in a Top1–DNA complex. (C) Binding mode of compound 7i and3D superimposed structure of the final simulated snapshot (grey). InPDB entry 1k4t, Met 230 and Lys 334 correspond to Met 428 and Lys751, respectively.43Table 3.2: Antiproliferative Activity of the N-Substituted (S)- THA Deriva-tives against HepG2Compound Cytotoxicity (IC50, µ M)7a 9.1± 0.027b 16.6± 0.087c 23.5± 0.097d 7.5± 0.017e 9.7± 0.037f 5.6± 0.017g 8.3± 0.027h 12.3± 0.027i 1.8± 0.017j 5.2± 0.037k 18.2± 0.037l 9.6± 0.027m 24.6± 0.157n na7o 33.5± 0.247p na7q na7r naCPT 0.70±0.00444Chapter 4Applied Drug Discovery:Topoisomerase II4.1 PrefaceAll the wet-lab experiments in this chapter were performed by the Dong group.4.2 IntroductionThis chapter describes the drug discovery programme executed to discovery cat-alytic inibitors of TOP II. This chapter is broken into seven sections: binding siteidentification, 1st round screening, 2nd round screening, 3rd round predictions,methodology,results and discussion and finally conclusion.4.2.1 Binding Site IdentificationTo determine the location of an efficacious binding site on TOP II a , the proteinwas visualized in Molecular Operating Environment (MOE) [49] (PDB ID: 4FM9).This is the first and perhaps most important step in drug discovery, as a bindingsite which is too polar will demand polar molecules which may have challengesgetting through the several membranes necessary for a small molecule to find itstarget, as well as a short half-life due to efficient excretion, and a large de-solvationpenalty preventing a favourable binding affinity. On the other hand, a pocket which45is too hydrophobic will demand hydrophobic molecules which may be promiscu-ous or toxic. Furthermore, the pocket should be sufficiently deep as to anchor thesmall molecule into the protein. After examining the Van-Der-Waals protein sur-face, a pocket with favorable qualities was discovered on the surface of the protein,containing a convex hull volume (1671.88 A˚3), 54% polar residues, 14% aromatic(F,Y,H,W), and 26 residues available for binding (calculated with Pock Drug) [98].This pocket also happened to be right in the middle of the DNA binding site, mak-ing for a very promising drug-able site. The pocket was validated with MOEs site-finder probe software which characterizes candidate small molecule binding pock-ets by calculating how well the protein can accommodate pseudo-atom probes. Tothe extent of our knowledge, this is the first time that this site on TOP II has everbeen targeted. This pocket was assessed with a molecular dynamics simulation todetermine its flexibility. A highly flexible binding site is usually not optimal forsmall molecule targetting, and should show a large D RMSD throughout a time de-pendent simulation. After a 46 nanosecond simulation, the RMSD was observed tochange by less than 1 A˚ (Figure 4.2), further validating the choice of pocket. Noneof the previously characterized ligand pockets were suitable due to unfavourablechemistries described below. The binding site of the catalytic inhibitor ICRF isone example, unfortunately this site is significantly smaller then that mentionedabove, with a convex hull volume of 1371.31 A˚3 (Figure 4.1). It is also unfavorableas the pocket is deep inside the protein [32]. Due to the small binding site volumeavailable, molecules which bind to this site have a very restricted medicinal chem-istry scope and cannot tolerate many modifications. The other option would be touse the same site as etoposide, which is bound between a cleaved DeoxyribonucleicAcid (DNA) structure as well as the surrounding TOP II protein [99]. This wouldbe refractory to our motivation for finding a catalytic inhibitor as this site has been[45] directly implicated in poisoning. As there are two isoforms of Top II in hu-mans, a and b , it is at this stage in which one isoform may be selectively inhibitedover the other. To do this, there must be some difference between the residues ofeach pocket site. In the bottom of the pocket of TOP II a is PHE818, which inthe case of b is LEU. Here, TOP II a may be specifically targeted with ligandsthat favourably interact with aromatic groups. With this pocket being directly inthe DNA binding site, it was hypothesized that if a small molecule was discovered46Figure 4.1: The Van-der-Waals surface of ICRF is show in blue which ICRFinside with respect to the ATPase domain of TOP II a .Figure 4.2: RMSD of pocket residues from Amber simulation of Topoiso-merase II a . Residues used where: 432-434, 350-359,321-323, 299-303, 286-289, 273-284,223-231, and 219-222. Each frame represents2500 steps at 2 femto-seconds each.which could in fact bind to such a site it would likely block DNA binding, pre-cluding the formation of ds-DNA breaks and thus making it a catalytic inhibitor.47Figure 4.3: Summary of the CADD pipeline. (Top) Compound 23 is shownas the first hit discovered from the pipeline. This is fed back into thenext pipeline for round 2, resulting in the discovery of compound 112,shown docked into the pocket (Bottom).4.3 First Round ScreeningFollowing binding site identification, we moved onto molecular screening (Fig-ure 4.3, top). The first and most course filter is to select docking candidates out of400 Million compounds from the ZINC Is Not Commercial (ZINC)-15 library [1].For this we choose molecules which are available for purchase, and have drug-likeproperties (as determined by Lipinski’s rule of 5). This narrows the dataset downto 6 Milllion molecules for which we use as input to molecular docking softwareGlide-SP and E-Hits [47],[45]. The two softwares have a complementary approachto molecular docking as described in Section 2.1.1. Although these docking pro-grams have been shown to dock crystal structure ligands back into their receptorswith 0.3 A˚ RMSD difference to the original, the scoring functions are still inaccu-rate by a few kcal ([45]). Therefore, it was reasoned that the strengths of moleculardocking would be leveraged by comparing the RMSD between molecules dockedwith E-Hits and those docked with Glide, where a small RMSD below 2 A˚ is used asthe threshold between accepting or rejecting a docked candidate. This still resultedin a large number of molecules and so the molecules are ranked and thresholdedby their Glide docking score, keeping those with scores less than -5.0 (lower scoreis better) and discarding those remaining. This left about 300 molecules. Finally,48- Etop 33 16 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 17 ICRF 193 50 34 35 37 38 39 40 41 42 43 44 45 46 47 48 49 51 36 52 D - Etop ICRF 193 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 supercoiled relaxed Figure 4.4: First round relaxation assay results. Compound concentrationsare present at 5 µM. Here D denotes DMSO treated with topoisomerase,Etop for etoposide, and -: No topoisomerase added. Red font indicatesactive compounds.the Absorption Distribution Metabolism Excretion (ADMET) software is used withADMET Risk, TOX Risk, and CYP Risk scores where a higher score means itis predicted to be more toxic [47]. Molecules with scores greater than 6.5,3.3,and 1 for descriptors ADMET Risk, TOX Risk, and CYP Risk were discarded asrecommended by the software manual. With this final ranking in hand, the top53 molecules were ordered and tested in-vitro. A supercoiled plasmid relaxationassay was used as an initial, course filter to determine candidate small molecules(Figure 4.4). Small molecule candidates that were sufficiently active in the firstassay were then subject to the kDNA decatenation assay. The decatenation assayis a more challenging requirement for TOP II as it relies on the enzyme relaxingmultiple plasmids simultaneously, as opposed to one at a time, as is done in the re-laxation assay (Figure 4.5). Out of those tested, compounds 19 and 23 (Figure 4.6,Figure 4.3) were found to be most active.4.4 Second Round ScreeningWith the discovered active compounds 19 and 23, a new screen was initiated tolook for similar molecules. For this, the shape similarity screening program ROCS[56] was used. ROCS is a fast shape comparison algorithm that uses Gaussianbased atom representations with chemical properties from the Implicit Mills Dean49D L Etop ICRF 193 1 2 10 14 15 19 23 28 30 33 36 37 38 41 51 0 kDNA Nicked relaxed  linearized Relaxed  Figure 4.5: First round kDNA decatenation assay results. Compound con-centrations are present at 5 µM. Here D denotes DMSO, L for linearDNA, Etop for etoposide.Figure 4.6: Compound 19.force field, this essentially tries to maximize the volume of overlap between chem-ically similar atoms as determined by the force field (Section 2.2.1). The posechosen for the ROCS search was the docking conformation. The ZINC-15 libraryof 6 million molecules previously mentioned was used for this screening. The top62 molecules were hand-picked based on their ROCS similarity score as well astheir docking scores. All of these compounds were ordered and tested in the de-catenation assay. Testing in the relaxation assay was stopped as it had too manyfalse positives. Following experimental validation, we observed a large enrich-ment in the proportion of active molecules of close to 10 fold in comparison tothe first round (Figure 4.7). Molecules were labelled as active based on their ac-tivity (the inverse intensity of relaxed DNA from the decatenation assay) being compounds 23’s, with added consultation from the experimentalists of the Donglab. The most potent inhibitor discovered was compound 112 which was followedclosely by compound 60. Although these two compounds are quite similar, thedifference here lies in the presence of an extra methoxy group on compound 112which extends out of the pocket and into the general DNA binding site area, thusconsistent with the hypothesized mechanism of action (Figure 4.9, Figure 4.8).Furthermore, compounds were labelled partially active based on consultation with50Figure 4.7: Summary of active and inactive compounds per round.Figure 4.8: Compound 60.experimentalists in the Dong lab. Figures of the decatenation assay results may befound in the Appendix (Figure A.1, Figure A.2, Figure A.3, Figure A.4, Figure A.5,Figure A.6).51Figure 4.9: Compound 112.Figure 4.10: MTS assay with results for HeLa cells treated with compound23. R1881 is a AR agonist and MDV is an AR antagonist, which areused here as controls, no effect is expected.4.5 Wet lab Characterization of active compoundsIn support of the hypothesized mechanism of action, it can be seen that the leadcompounds do not kill HeLa cells in an MTS assay (Figure 4.10) and thereforehave low cytotoxicity. This lack of cytotoxicity is further verified by the absenceof H2Ag in a western blot of cellular protein extract (Figure 4.12). However, the ac-tive compounds were observed to have a repressive effect on cellular proliferationas measured by the BrdU assay. In contrast to MTS, BrdU measures proliferationby reading out the replication rate of DNA directly through a fluorescent base-pair52Table 4.1: Decatenation assay based IC 50 in nM.60 23 ICRF181 506 25Figure 4.11: BrdU assay with compound 60 and 110.analogue. When K562 cells, a cell line that over-expresses TOP II, are treated withthe active compounds in the BrdU assay, proliferation is repressed in a way com-parable to etoposide (Figure 4.11). This result is believed to be due to the newlydiscovered family of active compounds acting as potent catalytic inhibitors (Fig-ure 4.25). We expect catalytic inhibitors to stall cell growth, for example throughthe decatenation checkpoint, while avoiding cell killing. To gain further insightinto the observed inhibition of cell growth, cells were treated with compound orvehicle, stained with 4,6-diamidino-2-phenylindole (DAPI), and visualized underthe fluorescence microscope (Figure 4.13). Here it can be seen that the cellularnucleus is enlarged due to the pause at the cell cycle, likely at the de-catenationcheckpoint. The nuclei become larger as they have double their chromosomes, butcannot yet divide (Figure 4.13).53Figure 4.12: H2Ag western blot. Cells were treated with etoposide (positivecontrol), ICRF (negative control), or Top60 (compound 60). Concen-trations are in µM.Figure 4.13: Fluorescence Microscopy of HeLa cells treated with compound60 or ICRF.4.6 Third Round PredictionsWith a dataset of 115 compounds now generated, we can turn to machine learningmethods to help us discover more active compounds.4.6.1 Dataset PreparationEach data point consisted of a 2-tuple, a molecule and its corresponding measuredintensity value proportional to the amount of relaxed DNA left in the gel aftertreatment with the molecule. To turn this regression problem into a binary deci-sion problem the intensities were thresholded, where only molecules better (withhigher intensity) than compound 23 were labelled as active, and everything elseinactive. Originally this dataset had 115 molecules, with 12 active compounds.54Figure 4.14: Pyridinothione tautomerisation. Left is the keto form and rightis the enol.Most of the compounds tested, including all of the active molecules contain apyrdinothione functionality, which may be present in its keto or enol tautomericforms (Figure 4.14). Due to the tautomeric nature of pyridinothione, we may findthis molecule present in our training dataset, or a query dataset in either its ketoor enol form, and of course both are desirable. To ensure that the machine learn-ing algorithm recognizes each tautomer equivalently, all the molecules with thepyridone and pyridinothione were enumerated with both tautomers. This is es-sentially dataset augmentation which is always useful to do. The augmentationled to a database of 157 molecules. The dataset was split into a train and test setwhich took 20% of the molecules at random for the test set. Additionally, com-pound 60 (in both tautomeric forms) was always included in the test set as it wasone of the most potent compounds and thus important to predict correctly. N-bitbinary vectors with r radii were used as input, which were created by the ECFPalgorithm (Section 2.2.1) implemented in RDKIT. For N and r we scan values[256,512,1024,2048] and [2,3,4,5] respectively, and used our test set to validatethe choice. Only 2 dimensional chemical input features are chosen due to theirproven robustness, and to avoid over-fitting where in contrast the usual approachis to use as many descriptors as possible. The hyper-parameter choice is validatedwith 10-fold cross validation based on F1 score.554.6.2 Decision Boundary TighteningDuring the process of developing a Machine Learning (ML) algorithm or pipeline,one might notice that the algorithm will consistently mis-classify a certain type ofdata. Of course to notice this mis-classification the algorithm developer needs tohave expert knowledge of the data. These mis-classifications may arise becausethe algorithm is under or over expressive, there might not be enough data, but italso may arise due to the nature of high dimensional feature spaces. Consider abinary classification problem where the data live in a p-dimensional hypercube.Our data is distributed such that all of the positive examples are separated fromthe negative examples by an inscribed p-dimensional hypersphere, all the positiveexamples lie between the sphere and the cube, and all the negative in the sphere.In p-dimensions the volume of a sphere is V = 2rpp(p2 )pG(p/2) and a cube (2r)p, where r isthe radius. Taking the ratio of the hypersphere volume to the hypercube volume asthe number of dimensions increase, a startling observation is made (Equation 4.1):VsphereVcube=p(p2 )pG(p/2)2p1(4.1)From this plot (Figure 4.15) we can see that in low dimensions most of our pointsare inside the sphere as expected as it takes up most of the space. However withlarge dimensionality like the space where our features live, most of the points willbe outside of the sphere and in the small margin where the positive class was. An-other way of viewing this is that as the dimensionality is increased, the volumebecomes all based on off of the corners of the hypercube, with almost 0 volumein the center of the circle. It should be emphasized that it only takes up to fourdimensions for the situation to become pathological. Indeed our everyday experi-ences do not prepare us for high dimensional space. This demonstrates that in highdimensions points become very sparse, and lie near the edge of the data manifold,and so lots of training data, an amount exponential in the number of dimensions isneeded, especially on the decision boundaries of the manifold, to make them robustin high dimensions.This is an issue in cheminformatics as the datasets are generally quite small,and this problem cannot be solved by simply getting more data, because data is56Figure 4.15: Illustration of the curse of dimensionality. The ratio between thevolume of a hypersphere to a hypercube where the sphere is inside thecube is shown as dimensionality increases. Here r = 1.not simple to acquire . The approach taken to counter this problem is to find mis-classified points at under trained parts of the decision boundary and re-label them.This approach decreases the amount of data necessary, by using data points whichare more important (data-efficient). The algorithm one must follow is then simple:1. Train machine learning algorithms2. Run the algorithms on a query dataset3. Look through positive examples, identify the mis-labelled samples, add themto the training step and repeat step 1.Clearly a decision must be made to determine which molecules were correctlyand incorrectly labelled. Although this does require expert knowledge, in prac-tise there are some obvious mis-classifications due to the above mentioned issues(Figure 4.15) which can easily be re-labelled.4.6.3 Machine Learning Pipeline DevelopmentBased on the 157 compound dataset, the first round of training was initiated withvarious machine learning classifiers: NB, Linear SVM, RBF kernel SVM, and KNN(Section 2.2) and a majority voting scheme to predict the actives. CV with F1 score57was done on a dataset of 124 molecules, 21 of which were active, while the testingset had 33 molecules, 5 of which were active. Once the best hyper-parameter com-binations were found, the algorithms were trained on all available data. For bothrounds a query database of a⇠ 12Million molecule subset of the ZINC-15 databasewas created based on molecules being available in stock, and having a molecularweight greater than 250 gmol . The predicted actives were then examined to deter-mine whether they were inactive and if so they were added to the dataset. Afterthe first round, 118 molecules were labelled using the decision boundary tighten-ing method, with a final dataset size of 275 molecules. The test set contained 11active molecules out of 57 molecules in total and in the training set there were 15actives out of 218 molecules. A majority vote was again used with all the classifi-cation algorithms to infer activity. To discover the best hyper-parameters, a 10 foldCV based grid search was done on all combinations of the available parameters.However, in the case of Random Forests the number of possible hyper-parametercombinations were much bigger than other algorithms and so two phases of train-ing were used. The first consisted of a random hyper-parameter search with 5 foldCV done for 100 iterations. Following this, the best discovered fingerprint param-eters were used for a more thorough 200 iteration hyper-parameter search of thealgorithm parameter space with 10-fold CV.4.7 Methodology4.7.1 Methodological Details of in-silico TechniquesProtein PreparationThe 4fm9 Protein Data Bank (PDB) entry for TOP II was downloaded and preparedwith the Schrodinger Maestro protein preparation software module [95]. Briefly,all original hydrogens were removed, re-added, crystallographic ions and waterswere also removed, and the propka optimizer at pH 7.4 was used to calculate theamino acid protonation states. The structure was then minimized using the OPLS-3 force field. The MOE site-finder algorithm, which uses virtual atom probes tosearch the protein surface, was used to help propose suitable pockets [100].58Docking and Chemical Similarity SearchDocking Grid Calculation: The grid was centered on the aforementioned pocketand calculated with Schrodingers Grid generation program [45], with all other pa-rameters set to default. Glide Ligand Docking: Standard precision docking wasused with all the other parameters set to their default values [45]. E-Hits Dock-ing: The top 100,000 molecules scored and ranked by Glide were docked with theE-Hits program [47] with all settings set to default. ROCS shape similarity: TheROCS shape similarity program offered by Open Eye was used to look for similarmolecules to our lead compound [56]. Here the Implicit Dean Mills force field wasused with all other parameters as default. Pocket Volume Calculation: The PockDrug software was used to calculated the pocket volume [98].Molecular Dynamics SimulationsAmber Simulation of 4fm9 pocket residues: The protein was minimized for 200cycles with the Newton-Rafson algorithm. The system was heated to 300K withno pressure control using the SHAKE algorithm to freeze the bonds to hydrogen[101]. A Langevin thermostat was used for this purpose. Generalized Born implicitsolvent was used with PBradii [55]. After heating, a production run was executedwith 2 femto-second time steps writing every 2500 steps. The simulation was runfor 9359 steps. OPLS-3 Simulation of compound 23 in TOP II: The simulationwas executed as previously described in Section 3.5.2.Machine LearningAll algorithms mentioned in the text come from the implementation in scikit-learn.Hyper-parameters were validated with 10-fold CV unless mentioned otherwise. Seetext for further details.4.7.2 Wet-Lab Experimental MethodologyRelaxation AssayIn the relaxation assay, negatively supercoiled DNA plasmids are exposed to TOPII with and without the inhibitor candidates. If the molecules work as inhibitors,59it is expected that there will be less supercoiled DNA than the control. This isvisualized with an agarose DNA gel, where DNA is separated based on its chargeand size. As supercoiled DNA has less surface area than relaxed DNA, bands fromsupercoiled DNA migrate lower (faster) than relaxed DNA.Decatenation AssayIn the decatenation Assay, plasmids that are catenated to eachother are the targetfor relaxation. This mess of DNA is harder to disentangle, than it is to relax super-coiled DNA, and is thus a less permissive assay. Molecules that are active in therelaxation assay are then used in the de-catenation assay. In this case the catenatedDNA migrates more slowly, but the relaxed DNA migrates more quickly becausethey are single DNA plasmids. The IC50 was measured in the de-catenation and re-laxation assays for the most promising compounds. To determine the efficacy of thediscovered molecules against cells, two assays were used: 3-(4,5-dimethylthiazol-2-yl)-5-(3-carboxymethoxyphenyl)-2-(4-sulfophenyl)-2H-tetrazolium (MTS) and Bro-modeoxyuridine (BRDU). While the MTS assay is only capable of elucidating theproportion of alive to dead cells, however BRDUtells us if the rate of DNA repli-cation decreases. This difference is important as a catalytic inhibitor is expectedto have low efficacy in terms of cell killing, however it is expected to slow downcell growth. Finally to further asses the affect of the discovered inhibitors on celldivision, the cell nucleus is visualized under a fluorescence microscope with DAPIstaining.MTS AssayCytotoxicity assays were performed using the commercial kit from Pierce (CAT#8078). Briefly, culture media was collected and used to measure lactate dehydro-genase (LDH) levels by a colorimetric method following manufacture’s protocol.BrdU AssayCell proliferation rates were measured by using the CellTitre 96 AqueousOne kit(Promega) and bromodeoxyuridine (BrdU) assay kit (Millipore) according to themanufacturer’s protocol, with minor modifications as previously described (PMID:6027659047).Fluorescence MicroscopyCells were fixed with 4% paraformaldehyde, treated in 0.25% Triton X-100 for 15mins, incubated with F-actin conjugated to Phalloidin-iFluor 488 (Abcam; Cam-bridge, UK), and mounted with DAPI staining mount (Vector Labs; Burlingame,CA, USA). Cells were then imaged by confocal micros-copy at 63X magnification(Zeiss LSM 780; Carl Zeiss AG; Oberkochen,4.8 Results and DiscussionScreeningRound one of virtual screening produced two active molecules, 19 and 23 (23: Fig-ure 4.3 (top), 19: Figure 4.6). It can be seen that many molecules passed the relax-ation assay (Figure 4.4). However the decatenation assay proved to be more strin-gent, with only two hits (Figure 4.5). As 19 was observed to be toxic, compound23 was pursued for a ROCS based shape similarity screening. Round two of virtualscreening was evidenced to be highly fruitful as the percent of actives increased byaround 10 fold (Figure 4.7). With these results, a database of 273 molecules as de-scribed above, was built and used in the aforementioned machine learning pipeline.From the first iteration of the ML pipeline, there were 129 molecules predicted tobe active out of the 12 million compounds screened from ZINC-15. All of theactives in the training set were recovered from the query database (this is becauseof overlapping molecules between the database used here for query and that fordocking). Examples of these ’obvious’ inactives include molecules which are toosmall or too big for the binding site, or have exotic chemical functionality that wasnever seen in training. However 111 of these were obviously not active and werere-labelled for the next round. Some of these molecules were artefacts of the smalland biased dataset. This is because most of the active molecules in the dataset con-tained the same pharmacophore, and also while there are less actives than inactives,the actives are still over-represented relative to their actual likelihood amongst theset of all molecules. These problems are all related to the fact that log-likelihood61assumes samples are Identical and Independently Distributed (IID), while in thisdataset they are clearly not that way. They are not independent as the discoveryof an active scaffold drove a search for similar scaffolds, and they are surely dif-ferently distributed, as each round had its own selection criteria. For example, themodel would often predict molecules to be active from the ZINC-15 set which hadan R4 group that was the same as other active compounds (Figure 4.25), but miss-ing the entire pyridine ring. The issue here is that all actives have the pyridine ring,but many inactives also have it, so it is not the main discriminating feature. It ismade even worse in this case as all the molecules in the partially active datasetwere labelled as inactive. This may have been avoided by setting up a regressiontask instead of a classification task. In this way intermediate values have somemeaning as being between the highly active and inactive molecules. Of course nosuch meaning is present if one were to just extend the number of classes. Un-fortunately this was not possible due to the absence of experimentally consistentnumerical values to do regression on. Some other molecules that appeared in theactive dataset were small and highly halogenated. The only plausible explanationfor this is that since these molecules, and their chemistries were not seen duringtraining, they are almost just as likely to be classified as active as they are inactive.Re-annotating this data in specific makes the machine learning classifier more ro-bust than re-annotating it with random data, as this will directly change the marginsused for classification and make them tighter. In the second round a database of 273molecules were trained on and 13 molecules were predicted to be active. Unfor-tunately the predictions made from this round have not yet been tested, preventingan experimentally validated analysis of the approach. From the perspective of MLthere are a few important things of note. Accuracy was observed to be a generallyuninformative metric, as is evident by the consistently high accuracy scores whichseem to be almost independent of the other scores used (Figure 4.16). Quite oftensome amount of over-fitting can be seen in the training set, however there was notenough in any case to preclude its use for inference.62Figure 4.16: Accuracy of various machine learning classifiers for round 1 and2.Effectiveness of the Decision Boundary Tightening MethodMost of the plots show (Figure 4.19, Figure 4.17, Figure 4.20, Figure 4.16) that thetesting scores have mainly increased after the second round, lending some supportto this hypothesis. However it is not clear that this methodology has done exactlywhat was hoped, as this increased score may be simply due to increasing the datasetsize by ⇠ 2 fold. This is in part because what we’ve hoped for out of decisionboundary tightening, hasn’t been calculated directly, to do this one would need tointegrate the decision boundary enclosed space which in most cases is intractable.MD based SARTo understand the binding efficiency of the discovered small molecules, an MDsimulation of compound 23 was performed for 30 nanoseconds. Throughout thesimulation, H-bonding interactions are observed with G737, Q742, N786, Q789,G793, N795, N867, N868, and R945. With the strongest interactions in order be-ing G737 > G742 > N795 > Q789 > G793 > G868 (Figure 4.21). Interestinglythe Q789 interaction was found to be mutually exclusive with G793, this repre-sents two alternative conformations observed throughout the simulation where thehydrogen bond donor between ring 2 and 3 moves between the sidechain carbonylof Q789 and the backbone carbonyl of G793 after 22 nano-seconds (Figure 4.21).63Figure 4.17: F1 Score of various machine learning classifiers for round 1 and2.Figure 4.18: Average F1 Score from 10-fold CV for various machine learningclassifiers for round 1 and 2.64Figure 4.19: MCC for various machine learning classifiers for round 1 and 2.Figure 4.20: Kappa score for various machine learning classifiers for round1 and 2.The second conformation is believe to represent a stable conformation as the RMSDdoes not change much at this point (Figure 4.22). This observation prompted usto visualize and analyze the protein-ligand interactions after the 22 nano-secondcut off. The amide next to pyridinothione is stably interacting with the proteinthroughout the simulation,with the carbonyl hydrogen bonded to the backbone ofG868, and with the NH of the ligand bonded through a very stable water bridge toN786 (Figure 4.23). The strongest interactions are shown below (Figure 4.24). In65Figure 4.21: Interaction Count Based Histogram. Counts are marked at everytime point for each residue (below) and the total number of interactionsare histogrammed above. Note the mutual exclusivity between GLN789 and GLY 793comparing the final snap-shot from the simulation with the docked structure, someinduced fitting is observed where N786 which has swung over towards the ligandto form a hydrogen bonding interaction through a water bridge interaction withthe carbonyl of the ligand amide. Overall the protein structure has deformed in afavourable way to the ligand as apparent by 3.38A˚ DRMSD with the initial dockedstructure. The stable conformation observed later on in the simulation has about1.5A˚ DRMSDwith the initial docked conformation, however the first observed con-formations is almost the same as the docked conformation, at 0.5A˚ DRMSD wherethe variation is due to some minor readjusting in the pocket.66Figure 4.22: The ligand RMSD and protein RMSD throughout the simula-tion. The ligand RMSD is shown in red, and stabilizes after 22nanoseconds, the protein RMSD is show in blue.Figure 4.23: Protein-Ligand Interaction Histogram of specific interactiontypes.The R2 substituentThe R2 substituent of the active scaffold (Figure 4.25) is quite interesting as it hasa unique chemistry. Firstly, the tautomer shown in Figure 4.25 is in the enol formas a thiol or an alcohol, although it could be active in its keto-form. Indeed as theactive site region in which R2 sits is positively charged due to lysine (Figure 4.3),the enolate might even be favoured in the case that R2 is a thiol (Figure 4.14).However, previous energetic calculations along with experimental evidence showthat the keto-form is more stable in polar solvents, while the enol form is more67Figure 4.24: Protein-Ligand Interaction Diagram. Only interactions that oc-cur more than 20% of the simulation are shown.Figure 4.25: Active Scaffold. Table 4.2stable in non-polar solvents. This is also the case for pyridinone[102], [103]. Thissubstituent is mainly active when it is sulphur, however it may also be active withoxygen, as evidenced by compound 90’s activity (Table 4.2). Interestingly com-pound 72 is partially active, which is almost the same as 23, except the methylthio-ether is present on R2 instead of the thioamide/ thiol. This is evidence that ahydrogen bond acceptor might not be necessary in the pocket, and so it is possi-ble the active tautomer is actually the thione. However to be sure, more medicinalchemistry must still be done. Compound 79 supports the claim that the moleculemust sit in a relatively constricted site, such as the proposed binding site, as thephenyl substituent (and other bigger substituents) on the sulfur render the moleculeinactive.68Figure 4.26: Compound 88. This compound is relevant for the SAR, it isshown separately as it has a different scaffold than the molecules ex-plored in Table 4.2The R1 and X substituentsAlthough R1 seems to be in a narrow area of the pocket of TOP II, it still cantolerate mild substitution. A methyl substituent was active, however a chlorinesubstituent was not. It is hard to compare these two molecules because the X andR4 are different, and their difference is as much likely due to these changes as it isto the Cl-Me change. Indeed it is possible that Chlorine is tolerated in this part ofthe active sight as Compound 88 (Figure 4.26) is active and has two Chlorines inthis position, and next to it (para to the amide).The R3 substituentSince the proposed mechanism of inhibition is through blocking binding to DNA,this substituent is believed to be extremely important, and indeed this substituentis quite sensitive to its substitution. The compounds in general are only activewith the right type of steric bulk on this substituent. The most potent compoundsfollow the pattern: linker-ring. In all cases, the compounds are most active wherethe linker is 0 or 1 atoms (i.e. compounds 115, 71 Table 4.2), there is also apartially active compound with a two atom spacer (compound 75) which appearsnecessary to accommodate its 7 membered ring. The most potent compounds, 112and 60, both have a hydrogen bond donor at the linker position, likely making anH-bond contact with Gly 793 which was observed to be the strongest interactionin the simulation of compound 23. Some substitutions of these rings seem to befavourable as well, specifically on meta or para positions, as there is some evidencethat ortho substitutions may lead to a loss in activity (compound 74).69Figure 4.27: Compound 89. This compound is relevant for the SAR, it isshown separately as it has a different scaffold than the molecules ex-plored in Table 4.2Alternative ScaffoldsNot many active scaffolds were found aside from the one described above Fig-ure 4.25. However compound 89 (Figure 4.27) was found to be active, contain-ing a similar three ring scaffold (i.e. the amide and pyridine ring), however withenough different features such that it may be pursued in the case that the currentlead scaffold has ADMET issues. After our first round, compound 19 was discov-ered, containing a different scaffold. However, it is now believed that its perceivedactivity was due to off-target effects, such as general affinity for DNA.Table 4.2: List of actives, relevant inactives, and partial actives for the SAR.Thresholding was done based on the decatenation intensities, where any-thing less than the decatenation intensity of compound 23 is classifiedas active, anything between compound 23 and compound 19 is inactive,while the rest are partially active. Variable groups map onto the moleculebased on Figure 4.25Compound ID Activity R1 R2 X R3 R422 0 H -O C H23 2 H -SH C H48 0 Cl / N /60 2 H -SH C H7062 2 H -SH C H63 2 H -SH C H64 2 H -SH C H65 0 H -SH C H66 0 H -Br C H67 1 H -SH C H68 1 H -SH C H69 1 H -SH C H70 2 H -SH C H71 0 H -SH C H72 1 H -S(Me) C H73 0 H -SH C CH(CH3)OCH3 H74 0 H -SH C H75 1 H -SH C H76 0 H -SH C H H77 2 H -SH C Cl79 0 H -SPh C H90 2 Me -O C H101 0 H -SH C H107 0 H -O C H71110 2 H -SH C H112 2 H -SH C H113 2 H -SH C H114 2 H -SH C H115 0 H -SH C H72Chapter 5Conclusion5.0.1 Topoisomerase IIn summary, immobilized STR1 was utilized to enantioselectively synthesize a se-ries of N-substituted (S)- THA derivatives and successfully provide a short and ef-ficient chemoenzymatic approach to prepare polycyclic MIAs. These N-substituted(S)-THAs were identified as novel TOP I inhibitors. Interestingly, the TOP I andHepG2 inhibitory activities observed in vitro showed that the C-3(S) configurationof the N-substituted THAs is preferred for their bioactivity. This was supported byMD simulation models in which the R-enantiomer was pushed out of the pocket ofTOP I while the corresponding S-enantiomer remained between the bases. More-over, a SAR study of N-substituted (S)-THAs was conducted, and the TOP I inhibitor7i was identified as the most potent analogue tested. The formation of a novelmethionine- aromatic contact between 7i and TOP I was proposed to explain theobserved activity. Indeed, this type of strong interaction is only beginning to berecognized as a stabilizing force in protein structures[104]. The results describedhere provide further insight into the biocatalytic character of STR1 and demon-strate an excellent opportunity for its use as a biocatalyst for the enantio-selectiveconstruction of diversified MIAs with potentially meaningful bioactivity.735.0.2 Topoisomerase IIIn conclusion, novel TOP II inhibitors were discovered with the application of var-ious virtual screening techniques. With an IC50 of 181 nanomolar, compound 60has the potential to be used in xenograft mouse models, and perhaps more. Finallymachine learning techniques prove to be promising for the discovery of new com-pounds, especially when the primary, simpler, methods have been exhausted. It isonly when a sufficiently large database has been built do machine learning algo-rithms become powerful. Indeed the most impressive machine learning algorithmstrain on datasets that have millions of data points, allowing for powerful algorithmsand methods such as deep learning to be applied. Unfortunately, in the realm ofsmall molecule based drug discovery data, the datasets are often too small for apowerful algorithm. However methods that are data-efficient, such as the men-tioned decision boundary tightening methodology employed in this thesis may beused to allow the application of more powerful algorithms to smaller datasets.Future DirectionsMedicinal ChemistryAfter simulating compound 23 in the proposed binding site of Topoisomerase IIit had become clear that relevant interactions between the ligand and protein wereonly induced after enough time in the simulation. Some examples of this includea water bridge which was present for over 50% of the simulation Figure 4.24,and interactions with some aforementioned residues that were not present in thedocked structure. In the future it should be this structure used for docking proposedchemical derivatives.Machine LearningAs the compounds proposed by the machine learning methods have not yet beentested, it is still not clear as to the efficacy of the trained algorithms. Indeed futurework would require testing them, and continually building the dataset.74Prostate CancerTranscription at AR promoters have been observed to be in part regulation by TOPII, and in addition TOP II inhibition with catalytic inhibitors have been demon-strated to reduce tumor volume in CRPC xenograft models [24], [26]. Towardsthis some preliminary results have shown efficacy of compound 60 in reducingPSA levels and should be further explored in future studies.Applications of lead compounds in Cancer TreatmentThe scope of possible cancers with which these compounds may treat are similarto what etoposide is used for, examples include: lung cancer, ovarian cancer, tes-ticular cancer, leukemia, and lymphoma. Since the activity of these chemicals areantagonistic to etoposide, they may be used to enable an increased dose of poisons,which has been done with ICRF [105]. As cancer cells that have less reliance onTopoisomerase II, may be more reliant on Topoisomerase I, it is quite possible thatcatalytic inhibitors will form an effective synergy with Topoisomerase I poisonslike camptothecins. Suramin is another known catalytic inhibitor, with off targettoxic side effects, which has shown efficacy in CRPC which provides more poten-tial for the application of our inhibitors [106],[107],[108]. Furthermore, many celllines resistant to alkylating agents such as cyclophosphamide and cisplatin showhigher Topoiosmerase II dependency, therefore these compounds may also be usedin synergy with compounds such as cisplatin, which has previously been demon-strated to work well with etoposide in small cell lung cancer, bladder cancer, andcervical cancer [109].75Bibliography[1] T. Sterling and J. J. Irwin, “Zinc 15–ligand discovery for everyone,”Journal of chemical information and modeling, vol. 55, no. 11,pp. 2324–2337, 2015. ! pages iii, 48[2] P. Axerio-Cilies, N. A. Lack, M. R. S. Nayana, K. H. Chan, A. Yeung,E. Leblanc, E. S. T. Guns, P. S. Rennie, and A. Cherkasov, “Inhibitors ofandrogen receptor activation function-2 (af2) site identified through virtualscreening,” Journal of medicinal chemistry, vol. 54, no. 18, pp. 6197–6205,2011. ! page iii[3] Y. Cai, H. Zhu, Z. Alperstein, W. Yu, A. Cherkasov, and H. Zou,“Strictosidine synthase triggered enantioselective synthesis of n-substituted(s)-3, 14, 18, 19-tetrahydroangustines as novel topoisomerase i inhibitors,”ACS chemical biology, vol. 12, no. 12, pp. 3086–3092, 2017. ! pagesvi, 31[4] S. B. Edge and C. C. Compton, “The american joint committee on cancer:the 7th edition of the ajcc cancer staging manual and the future of tnm,”Annals of surgical oncology, vol. 17, no. 6, pp. 1471–1474, 2010. ! pagesx, 5[5] A. K. Larsen, A. E. Escargueil, and A. Skladanowski, “Catalytictopoisomerase ii inhibitors in cancer therapy,” Pharmacology &therapeutics, vol. 99, no. 2, pp. 167–181, 2003. ! pages xi, 2, 8, 9, 10[6] D. Rogers and M. Hahn, “Extended-connectivity fingerprints,” Journal ofchemical information and modeling, vol. 50, no. 5, pp. 742–754, 2010. !pages xi, 20[7] Y. Pommier, E. Leo, H. Zhang, and C. Marchand, “Dna topoisomerases andtheir poisoning by anticancer and antibacterial drugs,” Chemistry &biology, vol. 17, no. 5, pp. 421–433, 2010. ! page 176[8] Y. Pommier, E. Leo, H. Zhang, and C. Marchand, “Dna topoisomerases andtheir poisoning by anticancer and antibacterial drugs,” Chemistry &biology, vol. 17, no. 5, pp. 421–433, 2010. ! pages 1, 2[9] Y. Pommier, “Drugging topoisomerases: lessons and challenges,” ACSchemical biology, vol. 8, no. 1, pp. 82–95, 2013. ! page 2[10] A. Heidenreich, G. Aus, M. Bolla, S. Joniau, V. B. Matveev, H. P. Schmid,and F. Zattoni, “Eau guidelines on prostate cancer,” European urology,vol. 53, no. 1, pp. 68–80, 2008. ! page 2[11] R. Siegel, J. Ma, Z. Zou, and A. Jemal, “Cancer statistics, 2014,” CA: acancer journal for clinicians, vol. 64, no. 1, pp. 9–29, 2014. ! page 2[12] M. Horner, N. Neyman, R. Aminou, N. Howlader, S. Altekruse, E. Feuer,L. Huang, A. Mariotto, B. Miller, D. Lewis, et al., “Seer cancer statisticsreview, 1975-2006 2009: 1975-2006 [http://seer. 2006/]. national cancer institute, bethesda,”MD based onNovember, 2008. ! page 2[13] J. E. McNeal, “Regional morphology and pathology of the prostate,”American journal of clinical pathology, vol. 49, no. 3, pp. 347–357, 1968.! page 3[14] M.-E. Taplin, B. Rajeshkumar, S. Halabi, C. P. Werner, B. A. Woda,J. Picus, W. Stadler, D. F. Hayes, P. W. Kantoff, N. J. Vogelzang, et al.,“Androgen receptor mutations in androgen-independent prostate cancer:Cancer and leukemia group b study 9663,” Journal of clinical oncology,vol. 21, no. 14, pp. 2673–2678, 2003. ! page 3[15] W. D. Tilley, S. S. Lim-Tio, D. J. Horsfall, J. O. Aspinall, V. R. Marshall,and J. M. Skinner, “Detection of discrete androgen receptor epitopes inprostate cancer by immunostaining: measurement by color video imageanalysis,” Cancer research, vol. 54, no. 15, pp. 4096–4102, 1994. ! page 3[16] D. G. Bostwick, “High grade prostatic intraepithelial neoplasia. the mostlikely precursor of prostate cancer,” Cancer, vol. 75, no. S7,pp. 1823–1836, 1995. ! page 3[17] J. A. Macoska, M. A. Micale, W. A. Sakr, P. D. Benson, and S. R. Wolman,“Extensive genetic alterations in prostate cancer revealed by dual pcr andfish analysis,” Genes, Chromosomes and Cancer, vol. 8, no. 2, pp. 88–97,1993. ! page 377[18] R. Bhatia-Gaur, A. A. Donjacour, P. J. Sciavolino, M. Kim, N. Desai,P. Young, C. R. Norton, T. Gridley, R. D. Cardiff, G. R. Cunha, et al.,“Roles for nkx3. 1 in prostate development and cancer,” Genes &development, vol. 13, no. 8, pp. 966–977, 1999. ! page 3[19] P. M. Pierorazio, P. C. Walsh, A. W. Partin, and J. I. Epstein, “Prognosticgleason grade grouping: data based on the modified gleason scoringsystem,” BJU international, vol. 111, no. 5, pp. 753–760, 2013. ! page 4[20] J. Sturge, M. P. Caley, and J. Waxman, “Bone metastasis in prostate cancer:emerging therapeutic strategies,” Nature reviews Clinical oncology, vol. 8,no. 6, p. 357, 2011. ! page 4[21] G. L. Andriole and W. J. Catalona, “The diagnosis and treatment ofprostate cancer,” Annual review of medicine, vol. 42, no. 1, pp. 9–15, 1991.! pages 4, 5[22] D. B. Rukstalis, “Treatment options after failure of radiation therapy—areview,” Reviews in urology, vol. 4, no. Suppl 2, p. S12, 2002. ! page 5[23] B. J. Feldman and D. Feldman, “The development ofandrogen-independent prostate cancer,” Nature Reviews Cancer, vol. 1,no. 1, p. 34, 2001. ! page 6[24] B.-G. Ju, V. V. Lunyak, V. Perissi, I. Garcia-Bassets, D. W. Rose, C. K.Glass, and M. G. Rosenfeld, “A topoisomerase iiß-mediated dsdna breakrequired for regulated transcription,” science, vol. 312, no. 5781,pp. 1798–1802, 2006. ! pages 6, 75[25] M. C. Haffner, M. J. Aryee, A. Toubaji, D. M. Esopi, R. Albadine,B. Gurel, W. B. Isaacs, G. S. Bova, W. Liu, J. Xu, et al., “Androgen-inducedtop2b-mediated double-strand breaks and prostate cancer generearrangements,” Nature genetics, vol. 42, no. 8, p. 668, 2010. ! page 6[26] H. Li, N. Xie, M. E. Gleave, and X. Dong, “Catalytic inhibitors of dnatopoisomerase ii suppress the androgen receptor signaling and prostatecancer progression,” Oncotarget, vol. 6, no. 24, p. 20474, 2015. ! pages6, 75[27] P. L. Depowski, S. I. Rosenthal, T. P. Brien, S. Stylos, R. L. Johnson, andJ. S. Ross, “Topoisomerase iia expression in breast cancer: correlationwith outcome variables,”Modern Pathology, vol. 13, no. 5, p. 542, 2000.! page 778[28] C. A. Felix, C. P. Kolaris, and N. Osheroff, “Topoisomerase ii and theetiology of chromosomal translocations,” DNA repair, vol. 5, no. 9,pp. 1093–1108, 2006. ! page 8[29] B. D. Lovett, L. L. Nigro, E. F. Rappaport, I. A. Blair, N. Osheroff,N. Zheng, M. D. Megonigal, W. R. Williams, P. C. Nowell, and C. A. Felix,“Near-precise interchromosomal recombination and functional dnatopoisomerase ii cleavage sites at mll and af-4 genomic breakpoints intreatment-related acute lymphoblastic leukemia with t (4; 11)translocation,” Proceedings of the National Academy of Sciences, vol. 98,no. 17, pp. 9802–9807, 2001. ! page 8[30] A. M. Azarova, Y. L. Lyu, C.-P. Lin, Y.-C. Tsai, J. Y.-N. Lau, J. C. Wang,and L. F. Liu, “Roles of dna topoisomerase ii isozymes in chemotherapyand secondary malignancies,” Proceedings of the National Academy ofSciences, vol. 104, no. 26, pp. 11014–11019, 2007. ! page 8[31] C. Bailly, “Contemporary challenges in the design of topoisomerase iiinhibitors for cancer chemotherapy,” Chemical reviews, vol. 112, no. 7,pp. 3611–3640, 2012. ! page 8[32] S. Classen, S. Olland, and J. M. Berger, “Structure of the topoisomerase iiatpase region and its mechanism of inhibition by the chemotherapeuticagent icrf-187,” Proceedings of the National Academy of Sciences,vol. 100, no. 19, pp. 10629–10634, 2003. ! pages 8, 46[33] M. Kawatani, H. Takayama, M. Muroi, S. Kimura, T. Maekawa, andH. Osada, “Identification of a small-molecule inhibitor of dnatopoisomerase ii by proteomic profiling,” Chemistry & biology, vol. 18,no. 6, pp. 743–751, 2011. ! page 9[34] L. H. Jensen, K. C. Nitiss, A. Rose, J. Dong, J. Zhou, T. Hu, N. Osheroff,P. B. Jensen, M. Sehested, and J. L. Nitiss, “A novel mechanism of cellkilling by anti-topoisomerase ii bisdioxopiperazines,” Journal of BiologicalChemistry, vol. 275, no. 3, pp. 2137–2146, 2000. ! page 9[35] P. B. Deming, C. A. Cistulli, H. Zhao, P. R. Graves, H. Piwnica-Worms,R. S. Paules, C. S. Downes, and W. K. Kaufmann, “The humandecatenation checkpoint,” Proceedings of the National Academy ofSciences, vol. 98, no. 21, pp. 12044–12049, 2001. ! page 9[36] R. Ishida, T. Miki, T. Narita, R. Yui, M. Sato, K. R. Utsumi, K. Tanabe, andT. Andoh, “Inhibition of intracellular topoisomerase ii by antitumor bis (2,796-dioxopiperazine) derivatives: mode of cell growth inhibition distinctfrom that of cleavable complex-forming type inhibitors,” Cancer Research,vol. 51, no. 18, pp. 4909–4916, 1991. ! page 9[37] D. J. Clarke, R. T. Johnson, and C. S. Downes, “Topoisomerase iiinhibition prevents anaphase chromatid segregation in mammalian cellsindependently of the generation of dna strand breaks,” Journal of cellscience, vol. 105, no. 2, pp. 563–569, 1993. ! page 9[38] H. BB, K. Hellmanns, E. Hermant, and V. Ferransf, “Chemical, biologicaland clinical aspects of dexrazoxane and other bisdioxopiperazines,”Current medicinal chemistry, vol. 5, no. 1, pp. 1–28, 1998. ! page 9[39] M. E. Wall, “Camptothecin and taxol: discovery to clinic,”Medicinalresearch reviews, vol. 18, no. 5, pp. 299–314, 1998. ! page 10[40] N. Pastor, I. Domı´nguez, M. L. Orta, C. Campanella, S. Mateos, andF. Corte´s, “The dna topoisomerase ii catalytic inhibitor merbarone isgenotoxic and induces endoreduplication,”MutationResearch/Fundamental and Molecular Mechanisms of Mutagenesis,vol. 738, pp. 45–51, 2012. ! page 10[41] P. E. Schroeder, P. B. Jensen, M. Sehested, K. F. Hofland, S. W. Langer,and B. B. Hasinoff, “Metabolism of dexrazoxane (icrf-187) used as arescue agent in cancer patients treated with high-dose etoposide,” Cancerchemotherapy and pharmacology, vol. 52, no. 2, pp. 167–174, 2003. !page 10[42] O. Trott and A. J. Olson, “Autodock vina: improving the speed andaccuracy of docking with a new scoring function, efficient optimization,and multithreading,” Journal of computational chemistry, vol. 31, no. 2,pp. 455–461, 2010. ! page 11[43] J. B. Cross, D. C. Thompson, B. K. Rai, J. C. Baber, K. Y. Fan, Y. Hu, andC. Humblet, “Comparison of several molecular docking programs: poseprediction and virtual screening accuracy,” Journal of chemical informationand modeling, vol. 49, no. 6, pp. 1455–1474, 2009. ! page 11[44] P. J. Goodford, “A computational procedure for determining energeticallyfavorable binding sites on biologically important macromolecules,” Journalof medicinal chemistry, vol. 28, no. 7, pp. 849–857, 1985. ! page 1280[45] R. A. Friesner, J. L. Banks, R. B. Murphy, T. A. Halgren, J. J. Klicic, D. T.Mainz, M. P. Repasky, E. H. Knoll, M. Shelley, J. K. Perry, et al., “Glide: anew approach for rapid, accurate docking and scoring. 1. method andassessment of docking accuracy,” Journal of medicinal chemistry, vol. 47,no. 7, pp. 1739–1749, 2004. ! pages 14, 36, 46, 48, 59[46] M. Regneri, “Finding all cliques of an undirected graph,” SeminarCurrentTrends in IE WS Jun, 2007. ! page 14[47] Z. Zsoldos, D. Reid, A. Simon, S. B. Sadjad, and A. P. Johnson, “ehits: anew fast, exhaustive flexible ligand docking system,” Journal of MolecularGraphics and Modelling, vol. 26, no. 1, pp. 198–212, 2007. ! pages14, 48, 49, 59[48] L. Verlet, “Computer” experiments” on classical fluids. i. thermodynamicalproperties of lennard-jones molecules,” Physical review, vol. 159, no. 1,p. 98, 1967. ! page 16[49] C. C. G. Inc., “Molecular operating environment (moe),” 2016. ! pages17, 45[50] C. R. Søndergaard, M. H. Olsson, M. Rostkowski, and J. H. Jensen,“Improved treatment of ligands and coupling effects in empiricalcalculation and rationalization of p k a values,” Journal of chemical theoryand computation, vol. 7, no. 7, pp. 2284–2295, 2011. ! page 17[51] J. Wang, R. M. Wolf, J. W. Caldwell, P. A. Kollman, and D. A. Case,“Development and testing of a general amber force field,” Journal ofcomputational chemistry, vol. 25, no. 9, pp. 1157–1174, 2004. ! page 17[52] R. L. Davidchack, R. Handel, and M. Tretyakov, “Langevin thermostat forrigid body dynamics,” The Journal of chemical physics, vol. 130, no. 23,p. 234101, 2009. ! page 18[53] D. J. Evans and B. L. Holian, “The nose–hoover thermostat,” The Journalof chemical physics, vol. 83, no. 8, pp. 4069–4074, 1985. ! pages 18, 39[54] G. J. Martyna, D. J. Tobias, and M. L. Klein, “Constant pressure moleculardynamics algorithms,” The Journal of Chemical Physics, vol. 101, no. 5,pp. 4177–4189, 1994. ! pages 18, 39[55] V. Tsui and D. A. Case, “Theory and applications of the generalized bornsolvation model in macromolecular simulations,” Biopolymers: Original81Research on Biomolecules, vol. 56, no. 4, pp. 275–291, 2000. ! pages18, 59[56] P. C. Hawkins, A. G. Skillman, and A. Nicholls, “Comparison ofshape-matching and docking as virtual screening tools,” Journal ofmedicinal chemistry, vol. 50, no. 1, pp. 74–82, 2007. ! pages 21, 49, 59[57] L. P. Hammett, “The effect of structure upon the reactions of organiccompounds. benzene derivatives,” Journal of the American ChemicalSociety, vol. 59, no. 1, pp. 96–103, 1937. ! page 21[58] L. Breiman, Classification and regression trees. Routledge, 2017. ! page23[59] T. Hastie and Friedman, “The elements of statistical learning; data mining,inference and prediction,” 2002. ! page 23[60] S. J. Russell and P. Norvig, Artificial intelligence: a modern approach.Malaysia; Pearson Education Limited,, 2016. ! pages 24, 25, 27[61] E. Keogh and A. Mueen, “Curse of dimensionality,” in Encyclopedia ofmachine learning, pp. 257–258, Springer, 2011. ! page 24[62] Y. Sasaki et al., “The truth of the f-measure,” Teach Tutor mater, vol. 1,no. 5, pp. 1–5, 2007. ! page 28[63] B. W. Matthews, “Comparison of the predicted and observed secondarystructure of t4 phage lysozyme,” Biochimica et Biophysica Acta(BBA)-Protein Structure, vol. 405, no. 2, pp. 442–451, 1975. ! page 29[64] M. L. McHugh, “Interrater reliability: the kappa statistic,” Biochemiamedica: Biochemia medica, vol. 22, no. 3, pp. 276–282, 2012. ! page 29[65] A. Nemes, “Monoterpenoid indole alkaloids, cns and anticancer drugs,”Analogue-Based Drug Discovery II, pp. 189–215, 2010. ! page 32[66] J. Leonard, “Recent progress in the chemistry of monoterpenoid indolealkaloids derived from secologanin,” Natural Product Reports, vol. 16,no. 3, pp. 319–338, 1999. ! page 32[67] J. Sto¨ckigt, A. P. Antonchick, F. Wu, and H. Waldmann, “Thepictet–spengler reaction in nature and in organic chemistry,” AngewandteChemie International Edition, vol. 50, no. 37, pp. 8538–8564, 2011. !page 3282[68] J. Sto¨ckigt and S. Panjikar, “Structural biology in plant natural productbiosynthesis—architecture of enzymes from monoterpenoid indole andtropane alkaloid biosynthesis,” Natural product reports, vol. 24, no. 6,pp. 1382–1400, 2007. ! page 32[69] J. Sto¨ckigt, L. Barleben, S. Panjikar, and E. A. Loris, “3d-structure andfunction of strictosidine synthase–the key enzyme of monoterpenoid indolealkaloid biosynthesis,” Plant Physiology and Biochemistry, vol. 46, no. 3,pp. 340–355, 2008. ! page 32[70] F. Wu, H. Zhu, L. Sun, C. Rajendran, M. Wang, X. Ren, S. Panjikar,A. Cherkasov, H. Zou, and J. Sto¨ckigt, “Scaffold tailoring by a newlydetected pictet–spenglerase activity of strictosidine synthase: from thecommon tryptoline skeleton to the rare piperazino-indole framework,”Journal of the American Chemical Society, vol. 134, no. 3, pp. 1498–1500,2012. ! page 32[71] U. Pfitzner and M. Zenk, “Immobilization of strictosidine synthase fromcatharanthus cell cultures and preparative synthesis of strictosidine,” Plantamedica, vol. 46, no. 09, pp. 10–14, 1982. ! page 32[72] E. A. Loris, S. Panjikar, M. Ruppert, L. Barleben, M. Unger, H. Schu¨bel,and J. Sto¨ckigt, “Structure-based engineering of strictosidine synthase:auxiliary for alkaloid libraries,” Chemistry & biology, vol. 14, no. 9,pp. 979–985, 2007. ! page 32[73] J. J. Maresh, L.-A. Giddings, A. Friedrich, E. A. Loris, S. Panjikar, B. L.Trout, J. Sto¨ckigt, B. Peters, and S. E. O’Connor, “Strictosidine synthase:Mechanism of a pictet- spengler catalyzing enzyme,” Journal of theAmerican Chemical Society, vol. 130, no. 2, pp. 710–723, 2008. ! page 32[74] L. Yang, H. Zou, H. Zhu, M. Ruppert, J. Gong, and J. Sto¨ckigt, “Improvedexpression of his6-tagged strictosidine synthase cdna for chemo-enzymaticalkaloid diversification,” Chemistry & biodiversity, vol. 7, no. 4,pp. 860–870, 2010. ! page 32[75] H. Zhu, P. Kercmar, F. Wu, C. Rajendran, L. Sun, M. Wang, and J. Stockigt,“Using strictosidine synthase to prepare novel alkaloids,” Currentmedicinal chemistry, vol. 22, no. 15, pp. 1880–1888, 2015. ! page 32[76] H.-B. Zou, H.-J. Zhu, L. Zhang, L.-Q. Yang, Y.-P. Yu, and J. Sto¨ckigt, “Afacile chemoenzymatic approach: One-step syntheses of monoterpenoid83indole alkaloids,” Chemistry–An Asian Journal, vol. 5, no. 11,pp. 2400–2404, 2010. ! page 32[77] W. Me, C. Cook, et al., “The isolation and structure of camptothecin, anovel alkaloidal leukemia and tumor,” J Am Chem Soc, vol. 88, no. 16,pp. 3888–3890, 1966. ! page 32[78] G. Capranico, J. Marinello, and G. Chillemi, “Type i dna topoisomerases,”Journal of medicinal chemistry, vol. 60, no. 6, pp. 2169–2192, 2017. !page 33[79] E. Martino, S. Della Volpe, E. Terribile, E. Benetti, M. Sakaj,A. Centamore, A. Sala, and S. Collina, “The long story of camptothecin:from traditional medicine to drugs,” Bioorganic & medicinal chemistryletters, vol. 27, no. 4, pp. 701–707, 2017. ! page 33[80] Y. H. Park, C. U. Chung, B. M. Park, M. R. Park, D. I. Park, J. Y. Moon,H. S. Park, J. H. Kim, S. S. Jung, J. O. Kim, et al., “Lesser toxicities ofbelotecan in patients with small cell lung cancer: a retrospectivesingle-center study of camptothecin analogs,” Canadian respiratoryjournal, vol. 2016, 2016. ! page 33[81] K. Gokduman, “Strategies targeting dna topoisomerase i in cancerchemotherapy: camptothecins, nanocarriers for camptothecins, organicnon-camptothecin compounds and metal complexes,” Current drug targets,vol. 17, no. 16, pp. 1928–1939, 2016. ! page 33[82] C. Sheng, Z. Miao, and W. Zhang, “New strategies in the discovery ofnovel non-camptothecin topoisomerase i inhibitors,” Current medicinalchemistry, vol. 18, no. 28, pp. 4389–4409, 2011. ! page 33[83] G. M. Cragg and D. J. Newman, “A tale of two tumor targets:topoisomerase i and tubulin. the wall and wani contribution to cancerchemotherapy,” Journal of Natural Products, vol. 67, no. 2, pp. 232–244,2004. ! page 33[84] B. Han, L. H. Stockwin, C. Hancock, S. X. Yu, M. G. Hollingshead, andD. L. Newton, “Proteomic analysis of nuclei isolated from cancer cell linestreated with indenoisoquinoline nsc 724998, a novel topoisomerase iinhibitor,” Journal of proteome research, vol. 9, no. 8, pp. 4016–4027,2010. ! page 3384[85] L. S. Kurtzberg, S. D. Roth, R. D. Krumbholz, J. L. Crawford, C. Bormann,S. Dunham, M. Yao, C. Rouleau, R. G. Bagley, X.-J. Yu, et al.,“Genz-644282, a novel non-camptothecin topoisomerase i inhibitor forcancer treatment,” Clinical Cancer Research, pp. clincanres–0542, 2011.! page 33[86] G. Dong, C. Sheng, S. Wang, Z. Miao, J. Yao, and W. Zhang, “Selection ofevodiamine as a novel topoisomerase i inhibitor by structure-based virtualscreening and hit optimization of evodiamine derivatives as antitumoragents,” Journal of medicinal chemistry, vol. 53, no. 21, pp. 7521–7531,2010. ! page 33[87] C. Erdelmeier, U. Regenass, T. Rali, and O. Sticher, “Indole alkaloids within vitro antiproliferative activity from the ammoniacal extract of naucleaorientalis1,” Planta medica, vol. 58, no. 01, pp. 43–48, 1992. ! page 33[88] D.-L. Chen, G.-X. Ma, M.-J. He, Y.-Y. Liu, X.-B. Wang, and X.-Q. Yang,“Anti-inflammatory activity of two new indole alkaloids from the stems ofnauclea officinalis,” Helvetica Chimica Acta, vol. 99, no. 9, pp. 742–746,2016. ! page 33[89] T. Au, H. Cheung, and S. Sternhell, “New corynanthe alkaloids fromstrychnos angustiflora,” Journal of the Chemical Society, PerkinTransactions 1, pp. 13–16, 1973. ! page 33[90] M. A. Brook, D. B. Maclean, H. L. Holland, et al., “A new route to theindolopyridonaphthyridine ring system: synthesis of n-benzyl-13b,14-dihydronauclefine and n-benzyl-13b, 14-dihydroangustine,”Tetrahedron, vol. 43, no. 24, pp. 5761–5768, 1987. ! page 33[91] D. B. Repke, R. D. Clark, D. B. MacLean, et al., “Synthesis of naucle´fine,angustidine, angustine, and (±)-13b, 14-dihydroangustine,” Journal of theChemical Society, Chemical Communications, no. 6, pp. 439–440, 1988.! page 33[92] D. B. Repke, R. D. Clark, J. T. Nelson, D. B. MacLean, et al., “Synthesis ofnaucle´fine, angustidine, angustine,(±)-13b, 14-dihydro-angustine andnaulafine,” Tetrahedron, vol. 45, no. 9, pp. 2541–2550, 1989. ! page 33[93] R. Brown, A. Charalambides, and H. Cheung, “Synthesis ofdihydroangustine,” Tetrahedron letters, 1973. ! page 3385[94] A´. Patthy-Luka´ts, A´. Kocsis, L. F. Szabo´, and B. Poda´nyi, “Configurativecorrelation and conformational analysis of strictosidine and vincosidederivatives,” Journal of natural products, vol. 62, no. 11, pp. 1492–1499,1999. ! page 34[95] G. M. Sastry, M. Adzhigirey, T. Day, R. Annabhimoju, and W. Sherman,“Protein and ligand preparation: parameters, protocols, and influence onvirtual screening enrichments,” Journal of computer-aided moleculardesign, vol. 27, no. 3, pp. 221–234, 2013. ! pages 36, 38, 58[96] T. A. Halgren, R. B. Murphy, R. A. Friesner, H. S. Beard, L. L. Frye, W. T.Pollard, and J. L. Banks, “Glide: a new approach for rapid, accuratedocking and scoring. 2. enrichment factors in database screening,” Journalof medicinal chemistry, vol. 47, no. 7, pp. 1750–1759, 2004. ! page 36[97] E. Harder, W. Damm, J. Maple, C. Wu, M. Reboul, J. Y. Xiang, L. Wang,D. Lupyan, M. K. Dahlgren, J. L. Knight, et al., “Opls3: a force fieldproviding broad coverage of drug-like small molecules and proteins,”Journal of chemical theory and computation, vol. 12, no. 1, pp. 281–296,2015. ! page 38[98] H. A. Hussein, A. Borrel, C. Geneix, M. Petitjean, L. Regad, and A.-C.Camproux, “Pockdrug-server: a new web server for predicting pocketdruggability on holo and apo proteins,” Nucleic acids research, vol. 43,no. W1, pp. W436–W442, 2015. ! pages 46, 59[99] C.-C. Wu, T.-K. Li, L. Farh, L.-Y. Lin, T.-S. Lin, Y.-J. Yu, T.-J. Yen, C.-W.Chiang, and N.-L. Chan, “Structural basis of type ii topoisomeraseinhibition by the anticancer drug etoposide,” Science, vol. 333, no. 6041,pp. 459–462, 2011. ! page 46[100] P. Labute and M. Santavy, “Locating binding sites in protein structures,”Journal of Chemical Computing Group, 2007. ! page 58[101] K. D. Hammonds and J.-P. Ryckaert, “On the convergence of the shakealgorithm,” Computer Physics Communications, vol. 62, no. 2-3,pp. 336–351, 1991. ! page 59[102] A. R. Katritzky, K. Jug, and D. C. Oniciu, “Quantitative measures ofaromaticity for mono-, bi-, and tricyclic penta-and hexaatomicheteroaromatic ring systems and their interrelationships,” Chemicalreviews, vol. 101, no. 5, pp. 1421–1450, 2001. ! page 6886[103] L. Forlani, G. Cristoni, C. Boga, P. E. Todesco, E. Del Vecchio, S. Selva,and M. Monari, “Reinvestigation of the tautomerism of some substituted2-hydroxypyridines,” Arkivoc, vol. 11, pp. 198–215, 2002. ! page 68[104] C. C. Valley, A. Cembran, J. D. Perlmutter, A. K. Lewis, N. P. Labello,J. Gao, and J. N. Sachs, “The methionine-aromatic motif plays a uniquerole in stabilizing protein structure,” Journal of Biological Chemistry,vol. 287, no. 42, pp. 34979–34991, 2012. ! page 73[105] P. B. Jenen and M. Sehested, “Dna topoisomerase ii rescue by catalyticinhibitors: a new strategy to improve the antitumor selectivity ofetoposide,” Biochemical pharmacology, vol. 54, no. 7, pp. 755–759, 1997.! page 75[106] E. Calvo, J. Corte´s, J. Rodrı´guez, M. Sureda, C. Beltra´n, J. Rebollo,R. Martı´nez-Monge, J. Marı´a Beria´n, J. de Irala, and A. Brugarolas, “Fixedhigher dose schedule of suramin plus hydrocortisone in patients withhormone refractory prostate carcinoma: a multicenter phase ii study,”Cancer, vol. 92, no. 9, pp. 2435–2443, 2001. ! page 75[107] S. A. Grossman, S. Phuphanich, G. Lesser, J. Rozental, L. B. Grochow,J. Fisher, S. Piantadosi, and N. A. to Brain Tumor TherapyCNS Consortium, “Toxicity, efficacy, and pharmacology of suramin inadults with recurrent high-grade gliomas,” Journal of clinical oncology,vol. 19, no. 13, pp. 3260–3266, 2001. ! page 75[108] J. J. Knox and M. J. Moore, “Treatment of hormone refractory prostatecancer.,” in Seminars in urologic oncology, vol. 19, pp. 202–211, 2001. !page 75[109] J. R. Brahmer and D. S. Ettinger, “Carboplatin in the treatment of small celllung cancer,” The oncologist, vol. 3, no. 3, pp. 143–154, 1998. ! page 7587Appendix ASupporting MaterialsVeh Etop 19 23 53 55 56 57 58 59 60 61 62 63 64 Decatenated Marker Linear Marker kDNA ICRF Empty well Decatenated  kDNA Figure A.1: Second round decatenation results. Compound concentrationsare present at 5 µM. Here Veh denotes DMSO, Etop for etoposide.88Veh Etop 19 23 Decatenated Marker Linear Marker kDNA ICRF 54 65 67 68 69 70 71 72 73 66 Figure A.2: Second round decatenation results. Compound concentrationsare present at 5 µM. Here Decat Marker denotes fully decatenated DNAfrom Top II treatment, kDNA denotes catenated DNA , Linear markerfor linear DNA, Etop for etoposide.Etop 19 23 74 75 76 77 Veh kDNA Linear Marker Decatenated Marker ICRF Figure A.3: Second round decatenation results. Compound concentrationsare present at 5 µM. Here Decat Marker denotes fully decatenated DNAfrom Top II treatment, kDNA denotes catenated DNA , Linear markerfor linear DNA, Etop for etoposide.89Veh Etop ICRF KDNA Decatenated Marker 78 79 80 81 90 83 84 85 86 87 88 89 82 Chemical screening 78 to 90  kDNA Decatenated  Figure A.4: Second round decatenation results. Compound concentrationsare present at 5 µM. Here Decat Marker denotes fully decatenated DNAfrom Top II treatment, kDNA denotes catenated DNA , Etop for etopo-side.Chemical screening 91 to 105  Veh Etop ICRF KDNA 91 92 95 93 103 102 96 97 98 99 100 101 94 kDNA 104 105 Decatenated  Decatenated Marker Figure A.5: Second round decatenation results. Compound concentrationsare present at 5 µM. Here D denotes DMSO, L for linear DNA, Etopfor etoposide.Chemical screening 106 to 111  Veh Etop ICRF KDNA 106 107 110 108 109 111 Decatenated Marker Figure A.6: Second round decatenation results. Compound concentrationsare present at 5 µM. Here D denotes DMSO, L for linear DNA, Etopfor etoposide.90Figure A.7: Interaction frequency with key residues throughout the 100nssimulation of Topoisomerase 1 (1k4t) with compound 7a. The col-ors are coded as follows: purple represents hydrophobic interactions,blue represents water bridges, and green represents hydrogen bonds.Interactions were analyzed with Schrodinger’s simulation interactionsscript.Figure A.8: Protein-ligand interactions. Interactions present for over 30% ofthe simulation are visualized. Note water bridge present for most of thedynamics. Interactions were analyzed with Schrodinger’s simulationinteractions script.Figure A.9: Compound 83. This compound is relevant for the SAR, it isshown separately as it has a different scaffold than the molecules ex-plored in the table. Table 4.291


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items