Automating human-like conversations is much more difficult than you might imagine. Even the world’s top conversational AI systems — Amazon Alexa, Apple’s Siri, and Google Assistant — remain far from the stage where they can smoothly process unanticipated human requests while also keeping humans engaged with natural responses.
At the RE•WORK AI Assistant Summit San Francisco yesterday, research scientists and engineers from Amazon, Apple and Google spoke on how they are addressing challenges and evolving their conversational AI systems.
Google – Building a Conversational Agent Overnight with Dialogue Self-play
First at the podium was Google AI Research Engineer Dr. Pararth Shah, who described a dilemma in developing a conversational AI bot: on one hand, a rule-based bot, which researchers code with each new capability, has low recall in unanticipated interactions and no self-learning capability; on the other hand, a research-focused bot whose model is trained from data encounters the double challenge of costly dataset collection and annotation, and diminished control over the bot’s behavior.
To achieve both control and flexibility at scale, Dr. Shah proposed using self-play to build conversational AI. Researchers would first create both a user simulator bot and a rule-based bot. Given a specific task scenario, for example a restaurant date chat or booking a movie ticket, the two bots would talk to each other for five minutes.
Their conversation would be translated into natural (colloquial) dialogue by crowdsourced human labor, then added to an existing neural net-based bot training dataset, aka Supervised Learning. The neural net-based bot would then re-engage with the user simulator and be given reward/punishment based on its performance, aka Reinforcement Learning. Finally the bot could converse with flesh-and-blood users to get real feedback, aka Interactive Reinforcement Learning.
Apple – Siri’s Natural Language Understanding
Apple Research Scientist Dr. Alok Kothari outlined Siri’s Natural Language Understanding (NLU) development.
The initial version of Siri’s NLU was a rule-based system wherein researchers started with vocabulary maps and external knowledge bases for features, rule-based bottom-up tree traversal of the query to compose an intent, and intent rankings enabled by hand-coded weights. The system was deterministic and interpretable, and could easily handle unambiguous requests. However, researchers found it difficult to add new functionalities or improve accuracy.
Siri then proceeded to the next level, from rule-based to machine learned. The new iteration revamped how researchers designed each functionality. For example, Siri researchers reformulated the ranking problem of domain chooser as a classification problem, and adopted Support Vector Machines (SVMs), a sort of supervised learning model.
Researchers deployed two different strategies — tree structured parses and shallow parses — for parsing, a process that analyzes a string of natural language or computer language symbols in accordance with grammatical rules. Researchers discovered that the majority of Siri requests required only shallow parses, which was more accurate, faster to train, and more suited to producing annotations compared to tree structured parses. Researchers also began to apply a statistical modeling method Conditional Random Fields (CRFs) to parsing.
To support their new machine learned models, Apple researchers designed a better UI interface for annotators and developers, a training and prediction system to evaluate impact of new or edited examples, deployments of models to runtime servers, and metrics that indicate performances.
As deep learning blossomed over the last five years, Siri researchers realized it could deliver better performance than traditional machine learning methods, especially in data-intensive conditions. They switched their previous models from SVMs for domain chooser and CRFs for parsing to Long Short-Term Memory (LSTM) — a type of Recurrent Neural Network capable of learning long-term dependencies.
Siri researchers choose LSTM for a good reason. In the task of domain choosing for example, LSTM applies one model to all domains, reduces feature spaces to 1,300 (SVM required 500k), captures long-dependency of queries, and achieves better accuracy and generalization.
Dr. Kothari finished with an overview of other fields Siri’s NLU researchers are exploring, including semantic labeling, reinforcement learning, questions answering, conversation modeling, etc.
Alexa Prize – Advancing the State of the Art in Conversational AI
Amazon Alexa is leading the conversational AI race with a 70 percent smart speaker market share. The company is aggressively working to build on this advantage by organizing contests with enticing cash prizes for third-party developers.
Amazon Lab126 AI Scientist Dr. Chandra Khatri introduced the new Amazon Alexa Prize, a competition that challenges participants to create state-of-the-art social chatbots that can converse coherently and engagingly on popular topics for 20 minutes. The first-of-its-kind challenge invites the world’s top universities and institutes to submit solutions.
The University of Washington team Sounding Board took first place and US$500,000 in prize money at the inaugural Amazon Alexa Challenge with a chatbot that held conversations with an average duration of 10 minutes and 22 seconds, earning a score of 3.17 out of 5 from judges.
Amazon Alexa hopes the challenge can crowdsource human intelligence to solve today’s conversational AI challenges such as conversational Automatic Speech Recognition and NLU, dialog planning and context modeling, ranking and selection response generations, knowledge ingestion and reasoning, etc.
Dr. Khatri introduced a couple of techniques adopted by university teams. In dialogue management and context modeling for example, the winning UW team proposed a hybrid dialogue manager wherein a master manages the overall conversation and a collection of miniskills manage different conversation segments. Another top team designed a state graph to track dialog content, conversation state, feedback, user sentiments, and personalization.
One researcher to benefit from Amazon’s conversational AI development push is Dr. Zhou Yu, an Assistant Professor at the University of California in Davis who created a multi-modal chatbot with strategies designed to keep users engaged in the conversation. The project caught Amazon’s interest and garnered Dr. Yu an annual research sponsorship of US$100,000.
Humans are generally embracing conversational AI — according to a recent NPR and Edison Research study, 65 percent of American smart speaker users say they could not return to a life without them. But on the flipside, playing music, weather notifications and responding to general questions remain the top three relatively mundane tasks these speakers perform. Expanding capabilities and developing a natural conversational AI is the target that tech giants like Amazon, Apple and Google will aim for over the coming years.
Journalist: Tony Peng | Editor: Michael Sarazen