Machine Learning & Data Science Nature Language Tech Research

Microsoft DeBERTa Tops Human Performance on SuperGLUE NLU Benchmark

A new model surpassed human baseline performance on the challenging natural language understanding benchmark.

SuperGLUE met its match this week when, for the first time, a new model surpassed human baseline performance on the challenging natural language understanding (NLU) benchmark.

Dubbed DeBERTa (Decoding-enhanced BERT with disentangled attention), the breakthrough Transformer-based neural language model was initially introduced by a team of researchers from Microsoft Dynamics 365 AI and Microsoft Research in June of last year. Recently scaled up to 1.5 billion parameters, DeBERTa “substantially” outperformed the previous SuperGLUE leader — Google’s 11 billion parameter T5 — and surpassed the human baseline with a score of 89.9 (vs. 89.8).

In the paper DeBERTa: Decoding-enhanced BERT with Disentangled Attention, researchers detail the new DeBERTa, which improves on BERT and RoBERTa models using two novel techniques. Introduced by Google AI in 2018, BERT is a bi-directional Transformer model for pretraining deep bi-directional representations from unlabelled text that has redefined the SOTA across NLP tasks. Facebook AI’s BERT-based RoBERTa meanwhile employs an improved training methodology to boost downstream task performance.

image.png
image.png

The first of the new techniques is a proposed disentangled self-attention mechanism. Much of the success of Transformer-based deep learning language models such as BERT has been attributed to their self-attention mechanisms, which enable each token in an input sequence to attend independently to all other tokens in the sequence. Each word in an input is represented using a vector that is the sum of its word (content) embedding and position embedding. The researchers however point out that a standard self-attention mechanism lacks a natural way to encode word position information. DeBERTa addresses this by using two vectors, which encode content and position, respectively.

The second novel technique is designed to deal with the limitation of relative positions shown in the standard BERT model. The Enhanced Mask Decoder (EMD) approach incorporates absolute positions in the decoding layer to predict the masked tokens in model pretraining. For example, if the words store and mall are masked for prediction in the sentence “A new store opened near the new mall,” the standard BERT will rely only on a relative positions mechanism to predict these masked tokens. The EMD enables DeBERTa to obtain more accurate predictions, as the syntactic roles of the words also depend heavily on their absolute positions in a sentence.

image.png
image.png

In experiments on the NLU benchmark SuperGLUE, a DeBERTa model scaled up to 1.5 billion parameters outperformed Google’s 11 billion parameter T5 language model by 0.6 percent, and was the first model to surpass the human baseline. Moreover, compared to the robust RoBERTa and XLNet models, DeBERTa demonstrated better performance on NLU and NLG (natural language generation) tasks with better pretraining efficiency.

image.png
image.png

The paper DeBERTa: Decoding-enhanced BERT with Disentangled Attention is on arXiv. The team will update their GitHub code repository soon with the latest DeBERTa code and models.


Reporter: Fangyu Cai | Editor: Michael Sarazen


B4.png

Synced Report | A Survey of China’s Artificial Intelligence Solutions in Response to the COVID-19 Pandemic — 87 Case Studies from 700+ AI Vendors

This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon KindleAlong with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.

Click here to find more reports from us.


AI Weekly.png

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

6 comments on “Microsoft DeBERTa Tops Human Performance on SuperGLUE NLU Benchmark

  1. Pingback: [R] Microsoft DeBERTa Tops Human Performance on SuperGLUE NLU Benchmark – tensor.io

  2. Pingback: [R] Microsoft DeBERTa Tops Human Performance on SuperGLUE NLU Benchmark – ONEO AI

  3. Pingback: Microsoft DeBERTa Tops Human Performance on SuperGLUE NLU Benchmark | Synced | Synced - Buzzing Startups

  4. Pingback: Microsoft DeBERTa Tops Human Performance on SuperGLUE NLU Benchmark - You Startups

  5. Pingback: Microsoft DeBERTa surpasses humans in SuperGlue reading comprehension test with a score of 89.9 (vs. 89.8 for humans) – ONEO AI

  6. Enough with the flashy titles (“surpass human performance”). The tech giants are bastardizing AI beyond decency. There’s no system as of yet that understands a single sentence better than a 4-year old. Stop this greedy commercialism.

Leave a Reply to Walid Saba Cancel reply

Your email address will not be published.

%d bloggers like this: