Microsoft DeBERTa Tops Human Performance on SuperGLUE NLU Benchmark

SuperGLUE met its match this week when, for the first time, a new model surpassed human baseline performance on the challenging natural language understanding (NLU) benchmark.

Dubbed DeBERTa (Decoding-enhanced BERT with disentangled attention), the breakthrough Transformer-based neural language model was initially introduced by a team of researchers from Microsoft Dynamics 365 AI and Microsoft Research in June of last year. Recently scaled up to 1.5 billion parameters, DeBERTa “substantially” outperformed the previous SuperGLUE leader — Google’s 11 billion parameter T5 — and surpassed the human baseline with a score of 89.9 (vs. 89.8).

In the paper DeBERTa: Decoding-enhanced BERT with Disentangled Attention, researchers detail the new DeBERTa, which improves on BERT and RoBERTa models using two novel techniques. Introduced by Google AI in 2018, BERT is a bi-directional Transformer model for pretraining deep bi-directional representations from unlabelled text that has redefined the SOTA across NLP tasks. Facebook AI’s BERT-based RoBERTa meanwhile employs an improved training methodology to boost downstream task performance.

The first of the new techniques is a proposed disentangled self-attention mechanism. Much of the success of Transformer-based deep learning language models such as BERT has been attributed to their self-attention mechanisms, which enable each token in an input sequence to attend independently to all other tokens in the sequence. Each word in an input is represented using a vector that is the sum of its word (content) embedding and position embedding. The researchers however point out that a standard self-attention mechanism lacks a natural way to encode word position information. DeBERTa addresses this by using two vectors, which encode content and position, respectively.

The second novel technique is designed to deal with the limitation of relative positions shown in the standard BERT model. The Enhanced Mask Decoder (EMD) approach incorporates absolute positions in the decoding layer to predict the masked tokens in model pretraining. For example, if the words store and mall are masked for prediction in the sentence “A new store opened near the new mall,” the standard BERT will rely only on a relative positions mechanism to predict these masked tokens. The EMD enables DeBERTa to obtain more accurate predictions, as the syntactic roles of the words also depend heavily on their absolute positions in a sentence.

In experiments on the NLU benchmark SuperGLUE, a DeBERTa model scaled up to 1.5 billion parameters outperformed Google’s 11 billion parameter T5 language model by 0.6 percent, and was the first model to surpass the human baseline. Moreover, compared to the robust RoBERTa and XLNet models, DeBERTa demonstrated better performance on NLU and NLG (natural language generation) tasks with better pretraining efficiency.

The paper DeBERTa: Decoding-enhanced BERT with Disentangled Attention is on arXiv. The team will update their GitHub code repository soon with the latest DeBERTa code and models.

Reporter: Fangyu Cai | Editor: Michael Sarazen

Synced Report | A Survey of China’s Artificial Intelligence Solutions in Response to the COVID-19 Pandemic — 87 Case Studies from 700+ AI Vendors

This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.

Click here to find more reports from us.

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

8 comments on “Microsoft DeBERTa Tops Human Performance on SuperGLUE NLU Benchmark”

Pingback: [R] Microsoft DeBERTa Tops Human Performance on SuperGLUE NLU Benchmark – tensor.io
Pingback: [R] Microsoft DeBERTa Tops Human Performance on SuperGLUE NLU Benchmark – ONEO AI
Pingback: Microsoft DeBERTa Tops Human Performance on SuperGLUE NLU Benchmark | Synced | Synced - Buzzing Startups
Pingback: Microsoft DeBERTa Tops Human Performance on SuperGLUE NLU Benchmark - You Startups
Pingback: Microsoft DeBERTa surpasses humans in SuperGlue reading comprehension test with a score of 89.9 (vs. 89.8 for humans) – ONEO AI
Walid Saba

2021-01-08

Enough with the flashy titles (“surpass human performance”). The tech giants are bastardizing AI beyond decency. There’s no system as of yet that understands a single sentence better than a 4-year old. Stop this greedy commercialism.

Loading...

testmyspeed.onl

2022-10-05

While sensational outcomes often draw public attention, those technologies usually presuppose massive training datasets and unlimited computing resources and would take years to be applied into real-world challenges.

Loading...

ANTHONY JOSHUA

2024-01-12

FOR PROCEDURES LIKE IF YOUVE BEEN A VICTIM OF BOGUS INVESTMENT SCHEMES ILL STRONGLY RECOMMEND A CRYPTOCURRENCY RECOVERY AGENT WHO DID GREAT IN HANDLING AND RECOVERY OF MY BITCOINS FROM THIS SCHEME.

COINSRECOVERYWORLDWIDE
H[@]T
G
MaiL
[.]
COM

Loading...

Microsoft DeBERTa Tops Human Performance on SuperGLUE NLU Benchmark

Like this:

8 comments on “Microsoft DeBERTa Tops Human Performance on SuperGLUE NLU Benchmark”

Leave a Reply Cancel reply

Related

Share this:

Like this:

8 comments on “Microsoft DeBERTa Tops Human Performance on SuperGLUE NLU Benchmark”

Leave a Reply Cancel reply

Related