Good gamers can tune out distractions and unimportant on-screen information and focus their attention on avoiding obstacles and overtaking others in virtual racing games like Mario Kart. However, can machines behave similarly in such vision-based tasks? A possible solution is designing agents that encode and process abstract concepts, and research in this area has focused on learning all abstract information from visual inputs. This however is compute intensive and can even degrade model performance. Now, researchers from Google Brain Tokyo and Google Japan have proposed a novel approach that helps guide reinforcement learning (RL) agents to what’s important in vision-based tasks.
The researchers say that just as the human brain assigns most of its attention capacity to task relevant elements and becomes temporarily blind to other signals, their proposed agent learns to ignore all but the task critical regions in input images.
The team characterizes the current gradient descent or evolution strategies that calculate network weight parameters as direct encoding methods, and proposes instead treating self-attention as a form of indirect encoding, where large implicit weight matrices are generated from a small number of key-query parameters to construct highly parameter-efficient agents in a simple but powerful way. The researchers used neuroevolution AI techniques to train self-attention agents. This removed the unnecessary complexity required for gradient-based methods, resulting in simpler architectures. The team also incorporated modules to improve non-differentiable self-attention effectiveness.
The research team evaluated the method in two challenging vision-based RL tasks: CarRacing and DoomTakeCover. In experiments the proposed method solved both tasks and outperformed existing methods while requiring 1000x fewer parameters. The proposed agents also outperform conventional methods in ability to generalize to environments with different task irrelevant elements. Researchers further noted that the attention patches visualized in the pixel space made the agent’s decision process easier for humans to understand.
Alongside its state-of-the-art performance, researchers also identified some limitations in this approach, for example that much of the extra generalization capability is due to “attending to the right thing, rather than from logical reasoning”. The visual module also struggles to generalize to cases when there are dramatic changes to backgrounds.
The paper Neuroevolution of Self-Interpretable Agents is on arXiv.
Author: Yuqing Li | Editor: Michael Sarazen