Large-scale pretrained language models have advanced the development and deployment of open-domain conversation agents (chatbots). Although fine-tuning these models with target-specific data can significantly improve their performance on downstream tasks, a reliance on crowdsourced data for this purpose is problematic: it is difficult to scale such data, and it may not reflect the interests of organic users.
A simple and practical solution that circumvents these issues is to deploy the model publicly, which enables large-scale organic human interactions that can be leveraged for additional training and continual model improvement.
In the new paper BlenderBot 3: A Deployed Conversational Agent That Continually Learns to Responsibly Engage, researchers from Meta AI and Mila/McGill University release BlenderBot 3 (BB3), a 175B parameter state-of-the-art open-domain dialogue model deployed on a public website. BlenderBot 3 has full access to the Internet and long-term memory and is designed for continual learning via user interactions.
The team summarizes their main contributions as follows:
- We present the BlenderBot 3 (BB3) model itself, which is a 175B parameter transformer initialized from the pre-trained model OPT175B (Zhang et al., 2022) and then fine-tuned to perform modular tasks to complete its goals.
- We study how to train on human feedback from conversations in order to be better at the skills that people find important, with a full report given in a companion paper (Xu et al., 2022b).
- We detail the deployment design, including its user interface (UI). We report initial experiments conducted with organic user interactions.
- To conduct responsible continual learning with humans-in-the-loop we need learning algorithms that are robust to adversarial behaviour. We describe techniques we have developed in this area, with a full report given in a companion paper (Ju et al., 2022).
- We report overall results of our model. Our newly released system outperforms existing publicly available chatbots including its two predecessors by a wide margin.
- We release our new model weights, code, model card, conversational datasets and publications describing our work. We also detail our plan for releasing live deployment interactions and updated model snapshots derived from continual learning in the near future.
BB3 is built on a single transformer model and generates its dialogue via a series of dependent modules that perform sequence-to-sequence tasks. After a given module is executed, the generated output is fed into the next module to help produce a response. Each module thus carries the previous modules’ dialogue history annotated with speaker identification.
The paper presents an overview of the BB3 pipeline with regard to its modules’ roles:
- Internet search decision outputs whether or not an Internet search should be conducted.
- Generate Internet search query can then be used to generate a search query based on the full input context.
- Internet search calls an Internet search engine.
- Generate knowledge response to the corresponding input context.
- Extract relevant entity obtains a relevant entity for final response grounding.
- Generate a long-term memory outputs a summary of the last turn that will be stored in long-term memory.
- Long-term memory access decision outputs whether or not long-term memory access should be conducted.
- Generate dialogue response produces a final conversational response.
A key contribution of this work is training the model to better learn from human feedback via public conversations. The massive organic data collected from these interactions enables BB3’s continual learning and establishes it as an ever-improving agent.
The researchers believe that BB3’s “in the wild” human conversational data collection scheme will accommodate natural, longer and more diverse conversations and encourage more varied human feedback. Demo users can react to the bot’s messages with either thumbs-up or thumbs-down icons and specify why they disliked a message, “whether it was off-topic, nonsensical, rude, spam-like, or other.” Efforts have been made to mitigate harmful or inappropriate content generation using various benchmarks and techniques.
In their empirical study, the team compared BB3 with state-of-the-art baselines such as the SeeKeR language model and OPT-175B. In the evaluations, BB3-175B achieved the highest knowledgeable & engaging score and the lowest factual incorrectness score; and outperformed existing publicly available chatbots by a wide margin.
Overall, this work introduces BB3 as a superior conversational bot that continually learns to responsibly engage via its interactions. The team believes future AI research trends will involve continually learning and evolving agents, and plans to release the collected interactions dataset and model snapshots to encourage future research.
The model deployment is accessible on the BlenderBot 3 web page (currently US only). The paper BlenderBot 3: A Deployed Conversational Agent That Continually Learns to Responsibly Engage is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.