Text-based games have become a popular testbed for developing and testing reinforcement learning (RL) algorithms that process and respond to natural language inputs. This avenue of research aims to build autonomous agents that can utilize a semantic understanding of texts — in other words, agents that are intelligent enough to “understand” the meanings of words and phrases like humans do to succeed in such scenarios.
But it hasn’t worked out that way, according to a new study from Princeton University and Microsoft Research. The team makes the surprising discovery that current autonomous language-understanding agents are capable of achieving high scores even in the complete absence of language semantics, indicating such RL agents for text-based games might not be sufficiently leveraging the semantic structure of the texts they encounter.
To remedy this deficiency and produce agents with stronger semantic understanding, the team proposes an inverse dynamics decoder designed to regularize the representation space and encourage the encoding of more game-related semantics.
Previous work has deployed a spectrum of language processing methods for text-based games, including word vectors, neural networks, pretrained language models, open-domain question answering systems, knowledge graphs and reading comprehension systems. All these methods are based on RL frameworks, which treat text games as special instances of a partially observable Markov decision process (POMDP), where agents can perform actions that affect the system with the goal of maximizing a reward that depends on the sequence of system states and agent actions. Because these actions and observations are in the language space, the decipherable semantics are attached to text observations and actions.
In the paper Reading and Acting while Blindfolded: The Need for Semantics in Text Game Agents, the researchers set out to discover to what extent current RL agents leverage semantics in text-based games under three setups: Reducing Semantics via Minimizing Observation (MIN-OB), Breaking Semantics via Hashing (HASH), and Regularizing Semantics via Inverse Dynamics Decoding (INV-DY). They employ a Deep Reinforcement Relevance Network (DRRN) as their baseline RL agent. The DRRN learns a Q-network Qφ(o, a), encodes the observation and each action candidate using two separate gated recurrent units (GRU) encoders, and then aggregates the representations to derive the Q-value through a multilayer perceptron (MLP) decoder.
At each step in text games the (valid) action space changes, revealing useful information about the current state. In the MIN-OB setup, the researchers minimize the observation to only a location phrase in order to isolate the action semantics.
The two GRU encoders in the Q-network are used to ensure similar texts are given similar representations. To test whether such a semantics continuity is useful, the team breaks these two encoders by hashing observation and action texts (HASH), such that hashing can identify different observations and actions.
Finally, the researchers regulate semantics via an INV-DY approach. As the GRU representations in DRRN are only optimized for the temporal difference loss, text semantics can degenerate during encoding, and the text representations might arbitrarily overfit to the Q-values. To prevent this, INV-DY serves to regularize both action and observation representations to avoid degeneration by decoding back to the textual domain, to encourage the GRU encoders to encode action-relevant parts of observations, and to provide intrinsic motivation for exploration.
The team conducted three experiments to probe the effects of different semantic representations on 12 interactive fiction games from the Jericho benchmark.
The MIN-OB setup achieved similar maximum scores on most games compared to the base DRRN, but failed to reach high episodic scores, which suggests the importance of identifying different observations using language details. Most surprisingly, HASH almost doubled the DRRN final score on PENTARI, indicating that the DRRN model can have high performance without leveraging any language semantics. For INV-DY on the game ZORK I, the maximum observed score was 87, while the other models did not exceed 55. The study’s results demonstrate the potential benefits of developing RL agents with more semantic representations and “a finer grasp of natural language.”
An early version of the paper Reading and Acting while Blindfolded: The Need for Semantics in Text Game Agents was featured in the NeurIPS 2020 workshop Wordplay: When Language Meets Games. The updated paper is available on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.