Pretrained large language models (LLMs) have emerged as a driving force in the evolution of AI systems, and the global race is on to make such models even more powerful. Promising research directions for improving LLMs include model-specific fine-tuning and task-specific prompt engineering. Both of these approaches however have their downsides: the former can be computationally costly while the latter lacks generalization capabilities.
In the new paper UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation, a Microsoft research team introduces a novel approach that tunes a lightweight and versatile retriever to retrieve prompts for any given task input to improve the zero-shot performance of LLMs.

The team summarizes their main contributions as follows:
- We introduce UPRISE, a lightweight and versatile approach to improve zero-shot performance of LLMs in the cross-task and cross-model scenarios.
- UPRISE is tuned with GPT-Neo-2.7B, but can also benefit different LLMs of much larger scales, such as BLOOM-7.1B, OPT-66B and GPT3-175B.
- Our exploration on ChatGPT demonstrates the potential of UPRISE in improving performance of even the strongest LLMs.
The UPRISE prompting process comprises two straightforward steps: retrieve, then predict. Given an input, UPRISE first retrieves a set of positive prompts from a preconstructed pool, then concatenates them with the input to form an input sequence. This is fed to a frozen LLM (fixed weights/parameters), which generates a predicted output.

Central to the proposed approach is the prompt retriever. In the training stage, the frozen LLM supervises the prompt retriever’s fine-tuning across a set of tasks. In the inference stage, the trained retriever retrieves appropriate prompts for different task types and different LLMs. This cross-task and cross-model paradigm equips UPRISE with universality — the ability to generalize from seen-in-training to unseen task types — without further tuning.


In their empirical study, the team evaluated UPRISE on various natural language understanding tasks. UPRISE outperformed vanilla zero-shot prompting in the experiments and demonstrated strong universality in a cross-task and cross-model scenario. Moreover, the researchers note that UPRISE also mitigated the hallucination problems that have impaired ChatGPT performance, suggesting their approach’s potential to improve even the strongest LLMs.
The paper UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation is on arXiv.
Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
I couldn’t agree more
Great post! It’s fascinating to see how UPRISE improves zero-shot performance of LLMs, especially in a cross-task and cross-model scenario. I’m curious, how does UPRISE’s prompt retrieval process compare to other prompt engineering methods in terms of efficiency and effectiveness?
John
http://www.airiches.online/
Hi, thanks for your interest. While we focused our research on comparing with vanilla zero-shot prompting, we believe that different prompt engineering methods are complementary rather than competing. For instance, incorporating methods like zero-shot CoT into our prompt pool or instruction templates could enhance UPRISE’s performance. Thus we leave this for future work rather than comparing UPRISE with it.
I’m grateful you shared this useful knowledge. Your webpage is fantastic. Your website contains an incredible amount of information.
The perfect game to relax and release daily stress and anger.