DeepMind’s AlphaFold 2 grabbed headlines last year by leveraging a transformer-based model architecture to achieve atomic accuracy in protein structure prediction. While the development of deep neural networks (DNNs) has enabled significant performance improvements across a variety of natural language processing and computer vision tasks, AlphaFold’s success showed that DNNs can also be effective in solving challenging and important structural biology tasks.
AlphaFold 2 may be a game-changer in protein structure prediction, but its training and inference remain very time-consuming and expensive. To address this issue, a research team from the National University of Singapore, HPC-AI Technology Inc., Helixon and Shanghai Jiao Tong University has proposed FastFold, a highly efficient protein structure prediction model implementation for training and inference. FastFold can reduce AlphaFold 2’s overall training time from 11 days to 67 hours and achieve up to 9.5x speedups for long-sequence inference while scaling to an aggregate 6.02 PetaFLOPs with 90.1 percent parallel efficiency.

The team says FastFold is the first performance optimization method for the training and inference of protein structure prediction models. It significantly reduces the time and economic costs of AlphaFold model training and inference by applying large model training techniques such as parallelism strategies and communication optimization.
The researchers summarize their main contributions as:
- We optimize for AlphaFold operators based on the AlphaFold-specific performance characteristic. Combined with kernel fusion, our kernel implementation of FastFold achieves significant speedups.
- We propose Dynamic Axial Parallelism, which has a lower communication overhead than other model parallelism methods. In terms of communication optimization, the proposed Duality Async Operation implements computation-communication overlap in dynamic computational graphs framework like PyTorch.
- We successfully scale the AlphaFold model training to 512 NVIDIA A100 GPUs and obtain aggregate 6.02 PetaFLOPs at the training stage. The overall training time is reduced to 67 hours from 11 days with significant economic cost savings. Our FastFold achieves 7.5 ∼ 9.5× speedups for long sequences and makes it possible for inference over extremely long sequences at the inference stage.

The conventional AlphaFold model has three parts. An embedding component encodes the target sequence’s multiple sequence alignment (MSA) and template information into MSA representations containing the co-evolving information of all similar sequences and pair representations containing the interaction information of residues pairs in the sequences. These representations are fed into Evoformer blocks that process the MSA and pair representations via MSA Stack and Pair Stack techniques. The resulting representations’ highly processed modelling information goes to a structure module that outputs the protein’s three-dimensional structure prediction.


FastFold uses AlphaFold’s Evoformer backbone and reduces communication overhead by applying Dynamic Axial Parallelism, an innovative model parallelism strategy that outperforms the current standard tensor parallelism in terms of scaling efficiency. For communication optimization, the team introduces a Duality Async Operation that implements computation-communication overlap in dynamic computational graph frameworks such as PyTorch.

In their empirical study, the researchers compared FasfFold with AlphaFold and OpenFold. The results show that FastFold greatly reduces the time and economic costs of baseline protein structure prediction model training and inference, slashing overall AlphaFold training time from 11 days to 67 hours, achieving 7.5 ∼ 9.5× speedups for long-sequence inference, and scaling to an aggregate 6.02 PetaFLOPs with 90.1 percent parallel efficiency.
Overall, FastFold’s high model parallelism scaling efficiency validates it as an effective approach for addressing AlphaFold’s huge training and inference computation overhead.
The code is available on the project’s GitHub. The paper FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours is on arXiv.
Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
Pingback: HPC-AI’s FastFold Shortens AlphaFold Training Time from 11 Days to 67 Hours – Open Source Biology & Genetics Interest Group