BIRS Workshop Lecture Videos

Banff International Research Station Logo

BIRS Workshop Lecture Videos

Pruning false positives cases from small molecule docking results using machine learning techniques Arciniega, Marcelino


Over the last three decades, Computer Aided Drug Design (CADD) has positioned as one of the more useful approaches aiding the research at early stages of drug discovery process [1]. Particularly, small molecule docking algorithms have been employed exhaustively to identify the possible atomic interactions, between the protein target and a suggested small molecule, that support the formation of the protein-ligand complex. This evaluation is performed by employing a scoring function that relates geometric patterns of the interacting molecules to free energy values. However, the accurate and exact prediction of the binding free energy, along with the complex conformation, remains as an open problem [2]. As consequence, docking results present a high rate of false positive cases. The high complexity of the physicochemical process, together with the vast amount of structural experimental information available, renders the use of machine learning algorithms an attractive possibility [3, 4, 5]. In the present work, we briefly describe the problems associated with current docking scoring functions and posit the idea of pruning the false positive cases using machine learning algorithms. Then, we expose the limitations the docking algorithm (not only of the scoring function) of AutodockVina [6] by analyzing a set of approximately 15000 crystallographic complexes retrieved from Protein Data Bank [7]. Finally, we present preliminary results obtained with our tools designed for false positive identification. Specifically, we show how a relatively simple Bayesian Network, based on interaction fingerprints, can be used to infer the badly placed fragment molecules (with molecular weights in the range of 150-350 Da). Additionally, we present the results obtained of a Convolutional Neural Network to analyze docking poses (molecules with molecular weights in the range of 150-850 Da). Both networks show promising results by improving Receiver Operating Characteristic metrics as compared with the use of the docking protocol alone.

Acknowledgments: This project was supported by DireccioÌ n General de Asuntos del Personal AcadeÌ mico at Universidad Nacional AutoÌ noma de MeÌ xico (PAPIIT-IA202917). The authors thank DireccioÌ n General de CoÌ mputo y de Tecnologías de Información y Comunicación at Universidad Nacional AutoÌ noma de MeÌ xico for granting the use of the supercomputer Miztli (LANCAD-UNAM-DGTIC-320).

[1] G. Sliwoski, S. Kothiwale, J. Meiler, E. W. Lowe-Jr. Pharmacol. Rev., 66, 334-395, 2014.
[2] HA Carlson, et. al. J. Chem. Inf. Model., 56, 1063-1077, 2016.
[3] M. Arciniega, O. F. Lange. J. Chem. Inf. Model., 54, 1401-1411, 2014.
[4] J. C. Pereira, et. al. J. Chem. Inf. Model., 56, 2495-2506, 2016.
[5] M. Wójcikowski, P. J. Ballester, P. Siedlecki. Sci. Rep. 7, 46710, 2017.
[6] O. Trott, A. J. Olson. J. Compt. Chem. 31, 455-461, 2010.

Item Media

Item Citations and Data


Attribution-NonCommercial-NoDerivatives 4.0 International