Anyone taking an exam understands the linguistic and other challenges associated with providing paragraph-length answers to open-ended questions compared to simple yes/no or multiple-choice questions. Such long-form question-answering (LFQA) presents similar challenges in the natural language processing (NLP) research field, where existing approaches tend to focus on two core task components: information retrieval and synthesis.
In the new paper WebGPT: Browser-assisted Question-answering with Human Feedback, an OpenAI research team combines these existing approaches with improved training objectives. They employ the Microsoft Bing Web Search API for document retrieval and unsupervised pretraining and fine-tuning on the GPT-3 large language model for high-quality synthesis. They then use human feedback to directly optimize answer quality, enabling their method to achieve human-level performance on LFQA tasks.
The team summarizes their key contributions as follows:
- We create a text-based web-browsing environment that a fine-tuned language model can interact with. This allows us to improve both retrieval and synthesis in an end-to-end fashion using general methods such as imitation learning and reinforcement learning.
- We generate answers with references: passages extracted by the model from web pages while browsing. This is crucial for allowing labellers to judge the factual accuracy of answers without engaging in a difficult and subjective process of independent research.
Contemporary search engines are powerful, fast, and can deliver up-to-date knowledge. This has led to humans’ increasing reliance on search engines when seeking answers to questions — estimates of our total daily web searches run into the billions. The OpenAI researchers thus set out to design a text-based web-browsing environment that would allow pretrained language models to mimic such human web search behaviour.
Prompted with a question and some contextual and supporting information, the proposed WebGPT model performs web-based actions such as running a Bing search, clicking on links, scrolling through documents and extracting references and quotes. Browsing continues until the model issues a command to end browsing, the maximum number of actions has been reached, or the maximum total length of references has been reached. Finally, if at least one relevant reference has been detected, the model will compose a long-form answer to the question.
The team also designed a graphical interface for their text-based web-browsing environment to enable users to provide auxiliary annotations and comparison ratings to further improve the model’s understanding of the questions.
The team fine-tuned GPT-3 models in 760M, 13B and 175B sizes and used four main training methods: behaviour cloning (BC), reward modelling (RM), reinforcement learning (RL) and rejection sampling (best-of-n). They evaluated the proposed WebGPT on questions from the ELI5 (Explain Like I’m 5) subreddit, with human evaluators’ judgments based on the criteria that answers should be relevant, coherent, and supported by trustworthy references.
In the evaluations, the 175B best-of-64 WebGPT model’s answers were preferred to those written by human demonstrators 56 percent of the time and preferred to the reference answers from the ELI5 dataset 69 percent of the time.
Overall, the work demonstrates that a fine-tuned pretrained language model leveraging a text-based web-browsing environment can achieve high answer quality on LFQA tasks, even outperforming humans on the ELI5 dataset.
The paper WebGPT: Browser-assisted Question-answering with Human Feedback is on OpenAI.com.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.