AI Machine Learning & Data Science Nature Language Tech Research

Microsoft’s Parameter-Efficient Z-Code++ Language Model Beats the 200x Larger GPT3-175B on Abstractive Text Summarization

In the new paper Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization, a research team from Microsoft Azure AI and Microsoft Research presents Z-Code++, a novel encoder-decoder pretrained language model optimized for abstractive summarization that significantly improves performance on low-resource summarization tasks.

Abstractive text summarization is a natural language processing (NLP) task that aims at generating concise and fluent document summaries. The recent development of large-scale pretrained language models has greatly advanced abstractive text summarization performance, but such models can suffer from the “hallucination problem,” where the generated summaries can become nonsensical or unfaithful to the input document.

In the new paper Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization, a research team from Microsoft Azure AI and Microsoft Research presents Z-Code++, a novel encoder-decoder pretrained language model optimized for abstractive summarization that significantly improves performance on low-resource summarization tasks, outperforming the finetuned and 200x larger GPT3-175B on the SAMSum human-annotated dialogue dataset.

The proposed Z-Code++ extends the encoder-decoder model via three techniques: 1) a two-phase pretraining process, 2) a disentangled attention mechanism, and 3) a fusion-in-encoder method.

The two-phase pretraining includes language model pretraining and grounded pretraining phases. In the language model pretraining phase, Z-Code++ is pretrained using replaced token detection (RTD) and corrupted span prediction (CSP). RTD generates ambiguous tokens, which a discriminator then classifies as from either the original input or the generator. In the grounded pretraining phase, the model is further trained on a summarization corpora of documents-summary pairs, which significantly improves performance on downstream tasks in low-resource settings.

The researchers also replace the transformer self-attention layer in the encoder with disentangled attention (DA), which represents a word using two vectors that encode its content and position. DA is more efficient than the classic self-attention mechanism for encoding positional dependency and thus improves text summarization performance. Finally, a simple but effective fusion-in-encoder mechanism is employed, which can encode long sequences while maintaining high attention precision for short sequences.

In their empirical study, the team compared the proposed Z-Code++ with baseline methods on representative summarization tasks. In the evaluations, Z-Code++ achieved state-of-the-art performance on 9 out of 13 text summarization tasks across five languages, outperforming the 600x larger PaLM540B on XSum and the 200x larger finetuned GPT3-175B on SAMSum. Z-Code++ also demonstrated superior performance in zero-shot and few-shot settings.

Overall, Z-Code++ is shown to be a parameter-efficient pretrained language model with impressive performance on abstract text summarization tasks.

The paper Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization is on arXiv.


Author: Hecate He | Editor: Michael Sarazen


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

0 comments on “Microsoft’s Parameter-Efficient Z-Code++ Language Model Beats the 200x Larger GPT3-175B on Abstractive Text Summarization

Leave a Reply

Your email address will not be published.

%d bloggers like this: