- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Investigating ML potentials and deep generative models...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Investigating ML potentials and deep generative models for efficient conformational sampling Shenoy, Nikhil
Abstract
Efficiently sampling from the landscape of molecular conformations is an important task in computational drug discovery. Simulation approaches like Molecular Dynamics (MD) require the use of an energy function that is fast, accurate, transferable and scalable. Traditional energy function methods like Force Fields are fast, but inaccurate while Quantum Mechanical (QM) Methods are accurate but slow. Recently, Machine Learning (ML) potentials trained on datasets labeled with QM methods have become popular as they address the accuracy and speed trade-off. However, generating QM datasets is a cost-intensive exercise, and design choices like conformational diversity (coverage of the conformational landscape) and structural diversity (coverage of the chemical space) during the generation process can introduce biases into the dataset. In the first part of the thesis, we explore the intricate relationship between dataset biases, specifically conformational and structural diversity, and ML potential generalization. We investigate these dynamics through two distinct experiments: a fixed budget one, where the dataset size remains constant, and a fixed molecular set one, which focuses on fixed structural diversity while varying conformational diversity. Our results reveal the critical need for balanced structural and conformational diversity in QM datasets for optimal generalization, which current datasets lack. We believe that these findings can inform future data generation and development of ML potentials that generalize beyond training data. An alternative approach is directly sampling conformations given the molecular graph using deep generative models like Diffusion. Existing competitive approaches either use expensive local structure methods or rely on general-purpose large architecture models without task-specific inductive biases. In the second part of the thesis, we challenge the status quo and develop a simple and scalable deep generative method, Equivariant Transformer Flow (ET-Flow) incorporating flow-matching, an equivariant transformer, a harmonic prior, and an approximate Optimal Transport alignment. We achieve state-of-the-art performance on several molecular conformer generation benchmarks with significantly fewer parameters and inference steps than existing methods, highlighting the importance of inductive biases and well-informed modelling choices.
Item Metadata
Title |
Investigating ML potentials and deep generative models for efficient conformational sampling
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2024
|
Description |
Efficiently sampling from the landscape of molecular conformations is an important task in computational drug discovery. Simulation approaches like Molecular Dynamics (MD) require the use of an energy function that is fast, accurate, transferable and scalable. Traditional energy function methods like Force Fields are fast, but inaccurate while Quantum Mechanical (QM) Methods are accurate but slow. Recently, Machine Learning (ML) potentials trained on datasets labeled with QM methods have become popular as they address the accuracy and speed trade-off. However, generating QM datasets is a cost-intensive exercise, and design choices like conformational diversity (coverage of the conformational landscape) and structural diversity (coverage of the chemical space) during the generation process can introduce biases into the dataset.
In the first part of the thesis, we explore the intricate relationship between dataset biases, specifically conformational and structural diversity, and ML potential generalization. We investigate these dynamics through two distinct experiments: a fixed budget one, where the dataset size remains constant, and a fixed molecular set one, which focuses on fixed structural diversity while varying conformational diversity. Our results reveal the critical need for balanced structural and conformational diversity in QM datasets for optimal generalization, which current datasets lack. We believe that these findings can inform future data generation and development of ML potentials that generalize beyond training data.
An alternative approach is directly sampling conformations given the molecular graph using deep generative models like Diffusion. Existing competitive approaches either use expensive local structure methods or rely on general-purpose large architecture models without task-specific inductive biases. In the second part of the thesis, we challenge the status quo and develop a simple and scalable deep generative method, Equivariant Transformer Flow (ET-Flow) incorporating flow-matching, an equivariant transformer, a harmonic prior, and an approximate Optimal Transport alignment. We achieve state-of-the-art performance on several molecular conformer generation benchmarks with significantly fewer parameters and inference steps than existing methods, highlighting the importance of inductive biases and well-informed modelling choices.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2024-07-06
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0444096
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2024-11
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International