Getting lost can be a hassle when in unfamiliar places, and a common solution is to ask a local for directions. Facebook Artificial Intelligence Research (FAIR) wants to transform their AI learning algorithm into that savvy local guide.
The FAIR (Facebook AI Research) & MILA (Montreal Institute for Learning Algorithms) paper Talk the Walk: Navigating New York City through Grounded Dialogue introduces what it terms “the first large-scale dialogue dataset grounded in action and perception.” The research involves a natural language back-and-forth of sorts between an “AI guide” which sees a traditional 2D New York City street map; and an “AI tourist” viewing real world 360-degree photos of NYC.
In a game-like setting, the two agents are tasked with navigating the “tourist” to a target destination while communicating with each other using natural language. FAIR says the system is the first to combine perception, action, and interactive dialogue in a problem-solving process.
The paper introduces a Masked Attention for Spatial Convolutions (MASC) mechanism that can help the “guide” bot determine the correct place on its map through state-transition from the “tourist” bot’s messages. The team found that the MASC doubled system accuracy.
FAIR is releasing the “Talk the Walk” project code, which they believe can be a useful resource for AI scientists conducting language learning algorithm development and other multi-area application research.
Author: Robert Tian | Editor: Michael Sarazen