For years, researchers have been attempting to build AI systems that can converse as well and naturally as humans. So far however, these efforts have only been successful in narrow, specialized and pre-programmed tasks.
Three months ago Google released Meena, a neural-network-powered chatbot. Meena significantly outperformed other chatbots on Sensibleness and Specificity Average (SSA), an evaluation metric based on key elements of humanlike, multi-turn conversation.
This week, Facebook responded with its new state-of-the-art, open-source chatbot, Blender.
“This is the first chatbot to blend a diverse set of conversational skills — including empathy, knowledge, and personality — together in one system,” Facebook researchers boasted in a blog post.
The researchers pitted Blender against Meena, determining “in human evaluations of engagingness our best model outperforms Meena in a pairwise comparison 75 percent to 25 percent, and in terms of humanness by 65 percent to 35 percent.”
In an associated paper, the researchers summarize their approach for building such open-domain chatbots — including scale, blending skills, and generation strategies. “We achieved this milestone through a new chatbot recipe that includes improved decoding techniques, novel blending of skills, and a model with 9.4 billion parameters, which is 3.6x more than the largest existing system.”
The first step in Blender’s creation was large-scale training. The team pretrained large Transformer neural networks on large amounts of conversational data and used previously available public domain conversations that involved 1.5 billion training examples of extracted conversations.
They also introduced a novel Blended Skill Talk (BST) for training and evaluating desirable skills for Blender. Based on their previous research, the premise was that an ideal chatbot needs to be knowledgeable, have a personality, and be able to show empathy. Blending these skills however can be difficult because systems must be able to switch between different tasks when appropriate. The BST dataset provides a way to build systems that automatically and appropriately blend and exhibit these skills.
Generation strategies such as beam search, next token sampling, and n-gram blocking are typically used after model training to ensure conversational agents don’t repeat themselves or drop random words out of context. The researchers analyzed these generation strategies, determining for example that the length of an agent’s utterances is particularly important in achieving better results with human evaluators.
The researchers also noted that contrary to some recent research suggesting beam search is inferior to sampling, the choice of search hyperparameters can improve results by controlling trade-offs. “In particular, constraining the minimum beam length gives a crucial control of the dull versus spicy spectrum of responses.”
Despite the impressive performance achievements, the researchers stress much work remains in solving open-domain conversation. They are currently exploring ways to improve the models’ conversational quality in longer conversations with new architectures and different loss functions.
To encourage and assist in global conversational AI research, the team decided to release the complete model, code, and evaluation set-up to enable other AI researchers to reproduce, explore, and build on the work.
The paper Recipes for Building an Open-domain Chatbot is on arXiv, and the code can be found here.
Journalist: Yuan Yuan | Editor: Michael Sarazen