Google Research and DeepMind recently introduced Long-Range Arena (LRA), a benchmark for evaluating Transformer research on tasks requiring long sequence lengths.
The trainable attention mechanisms in Transformer architectures can identify complex dependencies between input sequence elements and have made transformers the SOTA architecture in NLP and other ML research fields. Transformer architectures however share a drawback: their memory complexity scales quadratically when the number of tokens in an input sequence increases. This has made their use prohibitively expensive in domains requiring longer sequence lengths.
There has been growing interest in creating more efficient Transformer models to reduce the memory footprint and computation requirements. In the paper Long-Range Arena: A Benchmark for Efficient Transformers, Google and DeepMind researchers introduce the LRA benchmark for evaluating Transformer models quality and efficiency in long-context scenarios.
The LRA benchmark suite tests model capabilities in dealing with diverse data types and structures such as text, mathematics, and visual data. It includes both synthetic probing tasks and real-world tasks comprising sequences ranging from 1K to 16K tokens:
- Long ListOps
- Byte-Level Text Classification
- Byte-Level Document Retrieval
- Image Classification on Sequences of Pixels
- Pathfinder (Long-Range Spatial Dependency)
- Pathfinder-X (Long-Range Spatial Dependencies With Extreme Lengths)
The researchers used these tasks to evaluate ten recently proposed efficient Transformer models: Local Attention Model, Sparse Transformers, Reformer, Linformer, Longformer, Sinkhorn Transformers, Performers, Synthesizers, Linear Transformers, and BigBird.
The experimental results on the LRA benchmark affirmed previous observation on Transformers: that the extreme length of a task could significantly obstruct models’ ability to perform. For example, no models learned anything meaningful on the Path-X task, which is the same as the standard Pathfinder task but for its much longer sequence lengths.
Utilizing the Byte-Level Text Classification task, the team examined run times and memory consumption of the sequence lengths. Here, the Performer and Linformer models scaled very well, with memory usage at 3K and 4K being about equal.
The researchers say this is the first extensive side-by-side comparison of these ten Transformer models. The overall results indicating each comes with trade-offs in terms of quality and speed/memory, and there is no one-size-fits-all solution. The team hopes LRA can lead to better understanding of efficient Transformer models and more research in this direction.
The paper Long-Range Arena: A Benchmark for Efficient Transformers is available on arXiv, and code is open-sourced on GitHub.
Reporter: Fangyu Cai | Editor: Michael Sarazen
This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.
Click here to find more reports from us.
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.