DeepMind researchers have uncovered parallels between how brains react to dopamine and the trending AI theory of distributional reinforcement learning. The findings validate the potential of distributional reinforcement learning, and prompted the DeepMind researchers to proudly proclaim that “now AI research is on the right path.“
In the new study researchers from DeepMind and Harvard University analyzed the activities of dopamine cells in mice and discovered that dopamine neurons predict rewards according to different levels of “pessimistic” and “optimistic” states. Using distributional TD algorithms, one of the simplest form of distributional RL, researchers hope to study and explain the effects of dopamine on behaviour, emotions and more.
In the experiments the mice received an unknown number of rewards, and the goal was to assess whether dopamine neuron activity would be more consistent with standard TD (Temporal Difference) or distributional TD. The results show significant differences exist between individual dopamine cells — some predict very large rewards, while others predict very small rewards.
In many cases — especially in real-world situations — future reward results are not a completely known quantity but rather predictions based on a specific behaviour, which has some randomness. For example if a humanoid AI agent in a simulation attempts to jump across a virtual gap, the predicted rewards would be two: success (reaching the other side), or failure (fall into the gap). In contrast to the standard TD algorithm which learns to predict the average future reward, distributional TD algorithms can learn to predict all future rewards with a two-peaked distribution of potential returns. Distributional reinforcement learning technology has been successfully used to build agents in games such as Go and StarCraft.
The research raises many new questions for neuroscientists to ponder. What if the brain selectively “listens” to optimistic or pessimistic dopamine neurons — could that be the cause of impulsive behaviour or depression? Once an animal learns the mechanism for assigning rewards, how will this representation be used in its downstream tasks? And how is the optimistic mood variability between dopamine cells related to other known variable forms in the brain?
DeepMind researchers hope to promote the development of neuroscience research by asking such questions, and in doing so form a virtuous circle that will also bring benefits to artificial intelligence research.
The paper A Distributional Code for Value in Dopamine-based Reinforcement Learning is onNature.
Author: Reina Qi Wan | Editor: Michael Sarazen