AI Others Research United States

Yann LeCun Hails MSA Transformer’s ‘Huge Progress’ in Protein Contact Prediction

UC Berkeley, Facebook AI Research and New York University researchers’ Multiple Sequence Alignments (MSA) Transformer surpasses current state-of-the-art unsupervised structure learning methods by a wide margin.

A transformer-based model that achieves state-of-the-art performance on unsupervised protein structure learning is making waves, with esteemed AI researcher Yann LeCun and others in the machine learning and biology communities celebrating the new study.

The development of protein biomolecules, or protein engineering, requires a holistic understanding of protein structure. As sequence variation within a protein family conveys information on the protein structure, approaches to learning protein structure have tended to separately fit models to each family of sequences. A prominent method is Neural Potts Models, which involves training a single model with shared parameters to explicitly model energy landscapes across multiple protein families.

More recently, due to the availability of large unlabelled protein databases generated from sequences of hundreds of bacterial genomes, a new approach has emerged. Protein language modelling fits large neural networks with shared parameters across millions of diverse sequences, presenting a promising unsupervised approach for distilling the fundamental features of a protein.

Although unsupervised protein language models show strong performance, they can only take a single sequence as input for inference, and so require many parameters. Potts models are superior in this aspect, as they can directly extract covariation signals from the input.

The proposed Multiple Sequence Alignments (MSA) Transformer combines the two paradigms. Introduced by researchers from UC Berkeley, Facebook AI Research and New York University, the model takes sets of aligned sequences as input, but shares parameters across many diverse sequence families.

image.png

Generally speaking, MSA Transformer models extend transformer pretraining to an MSA, which is an algorithmic solution for the alignment of related biological sequences such as proteins. Transformers are powerful sequence models as they construct a pairwise interaction map between all positions in a sequence, making them an ideal form for modelling residue-residue contacts. However, they can’t take a multiple sequence alignment as input to extract information during inference stage. To overcome this, the MSA Transformer architecture interleaves attention across the rows and columns of an alignment as in axial attention, enabling it to extract information during the inference phase to improve parameter efficiency.

image.png
(left to right) Sparsity structure of the attention, Untied row attention using different attention maps for each sequence in the MSA, single MSA Transformer block.

To assess performance, the researchers trained an MSA Transformer with 100M parameters on a large dataset (4.3 TB) of 26 million MSAs, with an average of 1192 sequences per MSA.

image.png
(left to right) Top-L long-range contact precision (higher is better) MSA Transformer vs. Potts model, and ESM-1b on 14,842 proteins; Characterization of long-range contact precision performance for MSA Transformer, ESM-1b, and Potts model as a function of MSA depth.

On the task of unsupervised contact prediction, the MSA Transformer model outperformed state-of-the-art transformer protein language models ESM-1b (Rives et al., 2020) and Potts models across all MSA depths by a wide margin.

Turing Award honoree Yann LeCun tweeted that the research represents “huge progress” in protein contact prediction using transformer architectures.

The paper MSA Transformer is on bioRxiv.


Author: Hecate He | Editor: Michael Sarazen


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

3 comments on “Yann LeCun Hails MSA Transformer’s ‘Huge Progress’ in Protein Contact Prediction

  1. Pingback: [N] Yann LeCun Hails MSA Transformer’s ‘Huge Progress’ in Protein Contact Prediction – ONEO AI

  2. sakavig58

    That assignment help assignmentguru is an excellent service for all your academic writing needs. I have used their platform multiple times to order various articles and assignments, and they have consistently delivered high-quality work. The team of writers at AssignmentGuru is skilled and knowledgeable, ensuring that every piece of writing is well-researched and tailored to my requirements. The ordering process is simple, and their customer support is responsive and helpful.

  3. moorhenjobless

    EAI reaches out to hundreds of thousands of individual subscribers on all continents and collaborates with an institutional member base including Fortune 500 companies, government organizations, and educational institutions slope unblocked, provide a free research and innovation platform.

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: