Google & MIT’s Confident Adaptive Language Modeling Uses Dynamic Compute Allocation to Achieve 3x Speedups

In the new paper Confident Adaptive Language Modeling, a research team from Google and MIT presents Confident Adaptive Language Modeling (CALM), a framework that dynamically allocates different amounts of compute to each input and generation timestep, achieving up to 3x speedups while maintaining high performance.

There was a nineteenth-century saying that mocked the use of “a sledgehammer to crack a peanut.” Google AI researcher Tal Schuster echoes this concept in introducing the new paper Confident Adaptive Language Modeling. While acknowledging the tremendous power of transformer-based large language models (LLMs), Schuster notes that many of the predictions they work on “require only minimal effort.” It could be said that using the entire LLM in such cases amounts to a sledgehammer-like overkill.

LLMs’ ever-increasing computation costs and associated inference slowdowns are the main bottlenecks impeding their practical application. Developed by a Google and MIT team, the proposed Confident Adaptive Language Modeling (CALM) framework addresses these issues by dynamically allocating different compute amounts to each input and generation timestep. CALM achieves up to 3x speedups on natural language processing (NLP) tasks while maintaining high model performance.

The team summarizes their main contributions as:

A framework (CALM) for reliably accelerating transformer-based LLM generations.
A systematic analysis of the token-wise early exit mechanism that motivates a simple-but-effective class of confidence measures and threshold functions that are used as part of the CALM framework.
An empirical demonstration of CALM’s efficiency gains on three diverse generation datasets.

The proposed framework is based on a saturation theory: that the top-ranked prediction in LLMs remains unchanged after some layer and is propagated upward. The number of layers used by the model can thus be dynamically decided with regard to each input.

Following this idea, the team develops an adaptive compute approach to dynamically allocate computational resources per input to reduce model complexity while maintaining good performance. This method is also referred to as “early-exiting.”

Building on their analysis of the early-exiting paradigm, the team developed CALM as a principled method for increasing model efficiency. CALM leverages a distribution-free risk control technique for calibrating local, per-token exit decisions, such that model performance is provably maintained with arbitrarily high probability. CALM can dynamically allocate different amounts of compute per generated token, following explicitly defined tolerance levels based on the full generation output.

In their empirical study, the team implemented CALM on top of the T5 encoder-decoder model and evaluated text-generation task performance on three datasets — CNN/DM, WMT EN-FR, and SQUAD. The results show that CALM can reduce model compute burdens and gain speedups of up to 3x while maintaining high performance.

The paper Confident Adaptive Language Modeling is on arXiv.

Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

4 comments on “Google & MIT’s Confident Adaptive Language Modeling Uses Dynamic Compute Allocation to Achieve 3x Speedups”

Ben Suit

2022-07-19

Almost everyone now has a profile on one or more social media sites. Instagram, Twitter, Facebook, or all three can be used. However, there are situations when you need to buy a social network account for a different reason. The goal might be anything. Customer acquisition or something else, personal or corporate. You can examine the current costs and get a better idea of what you want here. https://get-accs.com/

Loading...

Reply
sportstototvcom

2022-07-24

I am in fact pleased to read this web site posts which contains plenty of helpful information, thanks for providing these
information. https://www.sportstototv.com

Loading...

Reply
toto365.pro

2022-07-24

This article is genuinely a nice one it helps new web users, who are wishing for blogging.

Loading...

Reply
John Villegas

2022-09-28

Small and large businesses now depend on social media networks to find potential customers and to promote their products and services. If you own any business, the best thing you can do to reach more people and promote your services is through buying any verified social media accounts from bulkaccountsbuy.com. You will receive a ready to use account along with necessary login information from bulkaccountsbuy.com soon after ordering them. These accounts come with an established follower base.
https://www.bulkaccountsbuy.com/product/facebook-accounts/

Loading...

Reply

Google & MIT’s Confident Adaptive Language Modeling Uses Dynamic Compute Allocation to Achieve 3x Speedups

Like this:

4 comments on “Google & MIT’s Confident Adaptive Language Modeling Uses Dynamic Compute Allocation to Achieve 3x Speedups”

Leave a Reply to Ben Suit Cancel reply

Related

Share this:

Like this:

4 comments on “Google & MIT’s Confident Adaptive Language Modeling Uses Dynamic Compute Allocation to Achieve 3x Speedups”

Leave a Reply to Ben Suit Cancel reply

Related