Microsoft’s UPRISE Automatically Retrieves Prompts to Boost the Zero-Shot Performance of Large Language Models

Synced

3 years ago

Pretrained large language models (LLMs) have emerged as a driving force in the evolution of AI systems, and the global race is on to make such models even more powerful. Promising research directions for improving LLMs include model-specific fine-tuning and task-specific prompt engineering. Both of these approaches however have their downsides: the former can be computationally costly while the latter lacks generalization capabilities.

In the new paper UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation, a Microsoft research team introduces a novel approach that tunes a lightweight and versatile retriever to retrieve prompts for any given task input to improve the zero-shot performance of LLMs.

The team summarizes their main contributions as follows:

We introduce UPRISE, a lightweight and versatile approach to improve zero-shot performance of LLMs in the cross-task and cross-model scenarios.
UPRISE is tuned with GPT-Neo-2.7B, but can also benefit different LLMs of much larger scales, such as BLOOM-7.1B, OPT-66B and GPT3-175B.
Our exploration on ChatGPT demonstrates the potential of UPRISE in improving performance of even the strongest LLMs.

The UPRISE prompting process comprises two straightforward steps: retrieve, then predict. Given an input, UPRISE first retrieves a set of positive prompts from a preconstructed pool, then concatenates them with the input to form an input sequence. This is fed to a frozen LLM (fixed weights/parameters), which generates a predicted output.

Central to the proposed approach is the prompt retriever. In the training stage, the frozen LLM supervises the prompt retriever’s fine-tuning across a set of tasks. In the inference stage, the trained retriever retrieves appropriate prompts for different task types and different LLMs. This cross-task and cross-model paradigm equips UPRISE with universality — the ability to generalize from seen-in-training to unseen task types — without further tuning.

In their empirical study, the team evaluated UPRISE on various natural language understanding tasks. UPRISE outperformed vanilla zero-shot prompting in the experiments and demonstrated strong universality in a cross-task and cross-model scenario. Moreover, the researchers note that UPRISE also mitigated the hallucination problems that have impaired ChatGPT performance, suggesting their approach’s potential to improve even the strongest LLMs.

The paper UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation is on arXiv.

Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

Share this: