AI Machine Learning & Data Science Research

MIT’s DIFFDOCK Boosts the Molecular Docking Top-1 Success Rate from 23% to 38%

MIT Researchers propose DIFFDOCK, a diffusion generative model that significantly improves the molecular docking top-1 prediction success rate, from state-of-the-art traditional docking approaches’ 23 percent to 38 percent.

Market research firm Emersion Insights reports that global funding for AI-powered drug development topped US$4 billion in 2021, a 36 percent year-over-year increase, and is expected to continue its rapid growth. A critical component of computer-aided drug discovery is molecular docking, a task that predicts the binding structure of small molecule ligands to a protein. Although new approaches using deep learning models have increased the speed of such research compared to traditional search-based methods, they have not significantly boosted prediction accuracy.

In the new paper DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking, researchers from the Massachusetts Institute of Technology’s Computer Science & Artificial Intelligence Laboratory propose DIFFDOCK, a diffusion generative model (DGM) that significantly improves the molecular docking top-1 prediction success rate, from state-of-the-art traditional docking approaches’ 23 percent to 38 percent.

The team summarizes their main contributions as follows:

  1. We frame the molecular docking task as a generative problem and highlight the issues with previous deep-learning approaches.
  2. We formulate a novel diffusion process over ligand poses corresponding to the degrees of freedom involved in molecular docking.
  3. We achieve a new state-of-the-art 38% top-1 prediction with RMSD<2A on PDBBind blind docking benchmark, considerably surpassing the previous best search-based (23%) and deep learning methods (20%).

The team defines molecular docking as a generative modelling problem, i.e., given a ligand and target protein structure, the goal is to learn a distribution over ligand poses. They introduce a diffusion process over the degrees of freedom involved in docking, covering: 1) the position of the ligand, 2) its orientation in the pocket, and 3) its torsion angles. DIFFDOCK samples poses during the diffusion process to iteratively transform an uninformed, noisy prior distribution over ligand poses into the output model distribution.

The researchers also train a model to estimate the confidence level of the poses sampled from the DGM, to enable it to pick the most likely sample.

In their empirical study, the team evaluated DIFFDOCK on molecular complexes from PDBBind benchmark and compared it with state-of-the-art search-based methods such as SMINA and GLIDE and recent deep learning methods EquiBind and TANKBind.

DIFFDOCK excelled in the evaluations, surpassing all the baselines and achieving an impressive 38.2 percent top-1 success rate when sampling 40 poses while reaching much faster inference times.

This work presents an exciting breakthrough in molecular docking. The team believes DIFFDOCK can help existing real-world pipelines and open new research avenues for downstream task integration and protein-protein and protein-nucleic acid docking.

The paper DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking is on arXiv.


Author: Hecate He | Editor: Michael Sarazen


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

0 comments on “MIT’s DIFFDOCK Boosts the Molecular Docking Top-1 Success Rate from 23% to 38%

Leave a Reply

Your email address will not be published. Required fields are marked *