The AlphaFold2 protein structure prediction model has revolutionized structural biology research by providing faster and more efficient ways to unravel protein structure-function relationships. The AlphaFold2 model’s complex structure however entails huge memory and time demands in training, which limits its wider application in biocomputing and life science research.
In the new paper Efficient AlphaFold2 Training using Parallel Evoformer and Branch Parallelism, a Baidu research team presents a Parallel Evoformer and Branch Parallelism approach for efficient AlphaFold2 training. The novel strategy improves AlphaFold2 training speed by up to 38.67 percent without sacrificing performance.

The team summarizes their main contributions as follows:
- We improve the Evoformer in AlphaFold2 to Parallel Evoformer, which breaks the computational dependency of MSA and pair representation, and experiments show that this does not affect the accuracy.
- We propose Branch Parallelism for Parallel Evoformer, which splits different computing branches across more devices in parallel to speed up training efficiency. This breaks the limitation of data parallelism in the official implementation of AlphaFold2.
- We reduce the end-to-end training time of AlphaFold2 to 4.18 days on UniFold and 4.88 days on HelixFold, improving the training performance by 38.67% and 36.93%, respectively. It achieves efficient AlphaFold2 training, saving R&D economic costs for biocomputing research.
Introduced by UK AI company DeepMind, AlphaFold2 is an end-to-end protein estimation system that can directly reveal the 3D coordinates of all atoms in a given protein. While AlphaFold2 has achieved unprecedented accuracy in protein structure prediction, it takes 11 days to train a model from scratch on 128 TPUv3 cores, making it exceptionally time- and compute-expensive.

The team proposes two optimization techniques to make AlphaFold2 training more efficient: 1) A Parallel Evoformer that modifies the two serial computing branches in the existing Evoformer block into a parallel computing structure; and a Branch Parallelism (BP) technique for the Parallel Evoformer that speeds up compute by scaling to additional devices through data parallelism.
The team evaluated their approach in extensive experiments on two AlphaFold2 models implemented in deep learning frameworks: UniFold implemented in PyTorch, and HelixFold implemented in PaddlePaddle. The results show that the Parallel Evoformer and Branch Parallelism strategy can cut training time to less than five days on both systems, improving training performance by 38.67 percent in UniFold and by 36.93 percent in HelixFold.
This work demonstrates that the proposed approach can significantly boost AlphaFold2’s training efficiency. The team hopes their contribution can push progress in the important research field of protein structure prediction.
The paper Efficient AlphaFold2 Training using Parallel Evoformer and Branch Parallelism is on arXiv.
Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
0 comments on “Baidu’s Parallel Evoformer and Branch Parallelism Strategy Accelerates AlphaFold2 Training by 38.67%”