Since their 2017 debut as a novel approach to natural language processing (NLP), transformers have achieved epoch-making performance across an increasingly wide variety of tasks and fields and are now the architecture of choice for many academic and industry machine learning researchers and practitioners. Countless papers have been published on transformers and new and improved variants, but these rarely if ever include pseudocode — a simplified outline of a model’s operations formatted similarly to computer source code but with plain-language annotations.
In the new paper Formal Algorithms for Transformers, DeepMind researchers present a precise and compact overview of transformer architectures and formal algorithms. The unique study provides pseudocode for 15 transformer algorithms, along with explanations of what transformers are, how they are trained, what they’re used for, their key architectural components, tokenization, and practical considerations for prominent models.
The researchers introduce transformers as neural networks for sequential data that, in most cases, are used for two common tasks: sequence modelling and sequence-to-sequence prediction. They detail formal algorithms for both tasks and provide pseudocode that can be used as templates and adapted to describe future variations. An explanation of the main approaches for tokenization is also included to offer readers insights on how text is represented in such models.
The researchers present formal algorithms for the key components of transformer architectures, including token and positional embedding, basic single-query attention (e.g. bidirectional self-attention and unidirectional self-attention), multi-head attention, layer normalization, and unembedding; and detail prominent transformer architectures such as the Encoder-Decoder Transformer, BERT, and GPT. The team notes that pseudocode’s representations for reasoning can also benefit theoreticians interested in transformers and deep learning.
Overall, this paper neatly describes all aspects of transformer architectures, training, and inference; provides pseudocode for various algorithms used in their training and deployment; and includes a useful notation glossary. The researchers hope the work can provide readers of various levels with a better understanding of transformers and enable them to contribute to the literature on the topic; and aid developers in the implementation of their own transformer models by using the pseudocode as templates.
The paper Formal Algorithms for Transformers is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.