Deep Neural Networks (DNNs) have achieved outstanding results across a wide range of hot-topic tasks in computer vision and natural language processing. These achievements however come with a high cost, as solving increasingly complex tasks requires increasingly deep neural network architectures. Moreover, today’s deepening architectures not only increase the computational burden, they can also suffer from vanishing gradient problems.
Recent efforts to tackle the vanishing gradient problem in DNN training have leveraged advanced optimizers such as the adaptive moment estimation (Adam) optimizer in model training, but such existing optimizers are unable to exploit any gradient angular information other than magnitude.
To overcome these limitations, a team from the IEEE (Institute of Electrical and Electronics Engineers) has proposed AngularGrad — a novel optimization algorithm that takes both gradient direction and angular information into consideration. The proposed method successfully reduces the zig-zag effect in the optimization trajectory and speeds up convergence.
DNN training can be interpreted as obtaining a mathematical function that maps an input with a corresponding output by adjusting its parameters (weights and biases) to optimize a cost or loss function. During this process, the weights can be optimally achieved through an iterative procedure that automatically adjusts the weights’ values to reach the minimum value of the loss function. The optimizer thus plays a crucial role in the efficiency of the training process and the final generalization performance of a DNN.
The IEEE researchers say theirs is the first study to utilize a gradient vector’s direction and angle information as well as its magnitude. This approach significantly smooths trajectory fluctuations and traces a more direct path towards the optimal solution of the cost function. The proposed AngularGrad also reduces the required computational resources, leading to improved training efficiency and performance.
AngularGrad takes the angle between two consecutive gradients into account during optimization. The team introduces a new angular coefficient to dynamically adjust the learning rate, enabling AngularGrad to control parameter adjustments and reducing the high variance of the gradients as it minimizes the direction cosines of two consecutive gradients in each step. The team proposes two versions of the optimizer based on the cosine and tangent angles on the value of the angular coefficient, denoted as AngularGrad^cos and AngularGrad^tan respectively.
To evaluate their proposed optimizer, the team modelled optimization as a regression problem over three one-dimensional non-convex functions, using SGDM, Adam, diffGrad, AdaBelief, AngularGrad^cos and AngularGrad^tan as optimizers to find the optimal solutions for these functions.
The researchers say the stable training performance of AngularGrad in complex settings for both image and fine-grained classification tasks shows that it has a good generation ability, and the results of comprehensive experiments on computer vision tasks with different optimizers validate the AngularGrad optimizer’s ability to improve training efficiency and performance.
Overall, the empirical results demonstrate that the AngularGrad optimizer generates a more accurate step size and achieves faster and smoother convergence.
The IEEE team has made the source code available on the project GitHub. The paper AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks is on arXiv.
Author: Hecate He | Editor: Michael Sarazen, Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.