Due to the proven power of deep learning techniques in providing fast function approximations for almost every complex real-life problem, it has become common for machine learning practitioners to implement differentiable programming techniques through the optimization procedures of neural networks.
resulting differentiable programming , might not be algorithmically useful certain functions of system dynamics. In new paper GradientsNot All You Need, a research team from Google Brain and Radboud University discusses a failure mode variety of differentiable circumstances, ranging from recurrent neural networks and numerical physics simulation to training learned optimizers.
Machine learning techniques often resemble differentiating through some iterative system, and the resulting gradients would seem a panacea for neural network optimization, stochastic control policy, physics simulation, etc. Chaos however emerges naturally in iterated maps, and randomness can come for example from 1) Different minibatches of data in neural network training and learned optimization; 2) Environments in reinforcement learning; 3) Floating-point noise in how engine calculations are handled on an accelerator in physics simulation. In such cases, the chaotic dynamics will result in poorly behaved gradients.
The researchers propose various options to consider when the use of gradients is infeasible for system optimization. They suggest selecting well-behaved systems — for instance, changing the initialization and recurrent structures for recurrent neural networks and selecting well-behaved proxy objectives in statistical physics systems. Another possibility is truncated backpropagation through time. Also, instead of taking true gradients, it may be possible to use gradient clipping to train systems. Finally, they suggest it could be advantageous to simply resort to black-box methods to estimate gradients.
The paper provides an insightful look at chaos as a potential issue when computing gradients through dynamical systems, which the researchers hope can shed light on when gradients should and should not be used and what to do when gradients are not working. Co-author Luke Metz summed it up in a tweet: “Take gradients with care. Just because you can backprop doesn’t mean you should!”
The paper Gradients Are Not All You Need is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.