“Generalization” is an AI buzzword these days for good reason: most scientists would love to see the models they’re training in simulations and video game environments evolve and expand to take on meaningful real-world challenges — for example in safety, conservation, medicine, etc.
One concerned research area is deep reinforcement learning (DRL), which implements deep learning architectures with reinforcement learning algorithms to enable AI agents to learn the best actions possible to attain their goals in virtual environments. DRL has been widely applied in games and robotics.
Such DRL agents have an impressive track record on Starcraft II and Dota-2. But because they were trained in fixed environments, studies suggest DRL agents can fail to generalize to even slight variations of their training environments.
In a new paper, researchers from the New York University and Modl.ai, a company applying machine learning to game developing, suggest that simple spacial processing methods such as rotation, translation and cropping could help increase model generality.
The ability to learn directly from pixels as output by various games was one of the reasons for DRL’s surge in popularity over the last few years. But many researchers have begun to question what the models actually learn from those pixels. One way to investigate what models trained with DRL learn from pixel data is by studying their generalization capacity.
Starting from the hypothesis that DRL cannot easily learn generalizable policies on games using a static third-person perspective, the researchers discovered that the lack of generalization is partly due to the input representations. This means that while DRL models for games with static third-person representations do not tend to learn generalizable policies, they have a better chance of doing so if the game is “seen” from a more agent-centric perspective.
Because an agent’s immediate surroundings can greatly affect its ability to learn in DRL scenarios, the team proposed providing agents with a first-person view. They applied three basic image processing techniques — rotating, translating, and cropping — to the observable areas around agents.
Rotation keeps the agents always facing forward, so any action they take always happens from the same perspective. Translation then orients the observations around the agent so it is always at the center of its view. Finally, cropping shrinks observations down to just local information around the agent.
In their experiments the researchers observed that these three simple transformations enable better learning for agents, and the polices that are learned generalize much better to new environments.
The technique has so far only been tested on two game variants — a GVGAI port for the dungeon system in The Legend of Zelda and a simplified version of the game, Simple Zelda. For future work, the researchers intend to continue testing the generalization effects on different games, and improve their understanding of the effects of each transformation.
The paper Rotation, Translation, and Cropping for Zero-Shot Generalization is on arXiv.
Journalist: Yuan Yuan | Editor: Michael Sarazen