Tag: Neural Network

AI Machine Learning & Data Science Research

Georgia Tech & Microsoft Reveal ‘Super Tickets’ in Pretrained Language Models: Improving Model Compression and Generalization

A research team from Georgia Tech, Microsoft Research and Microsoft Azure AI studies the collections of “lottery tickets” in extremely over-parametrized models, revealing the generalization performance pattern of winning tickets and proving the existence of “super tickets.”

AI Machine Learning & Data Science Nature Language Tech Research

Study Shows Transformers Possess the Compositionality Power for Mathematical Reasoning

A research team from UC Davis, Microsoft Research and Johns Hopkins University extends work on training massive amounts of linguistic data to reveal the grammatical structures in their representations to the domain of mathematical reasoning, showing that both the standard transformer and the TP-Transformer can compose the meanings of mathematical symbols based on their structured relationships.

AI Machine Learning & Data Science Popular Research

ETH Zürich Identifies Priors That Boost Bayesian Deep Learning Models

A research team from ETH Zürich presents an overview of priors for (deep) Gaussian processes, variational autoencoders and Bayesian neural networks. The researchers propose that well-chosen priors can achieve theoretical and empirical properties such as uncertainty estimation, model selection and optimal decision support; and provide guidance on how to choose them.

AI Machine Learning & Data Science Popular Research

Bronstein, Bruna, Cohen and Velickovic Leverage the Erlangen Programme to Establish the Geometric Foundations of Deep Learning

Twitter Chief Scientist Michael Bronstein, Joan Bruna from New York University, Taco Cohen from Qualcomm AI and Petar Veličković from DeepMind publish a paper that aims to geometrically unify the typical architectures of CNNs, GNNs, LSTMs, Transformers, etc. from the perspective of symmetry and invariance to build an “Erlangen Programme” for deep neural networks.

AI Machine Learning & Data Science Research

CMU, UT Austin & Facebook’s CNN Layer Width Optimization Strategies Achieve 320x Overhead Reduction

Researchers from Carnegie Mellon University, the University of Texas at Austin and Facebook AI propose a novel paradigm to optimize widths for each CNN layer. The method is compatible across various width optimization algorithms and networks and achieves up to a 320x reduction in width optimization overhead without compromising top-1 accuracy on ImageNet.

AI Research

DeepMind AI Flunks High School Math Test

DeepMind trained and tested its neural model by first collecting a dataset consisting of different types of mathematics problems. Rather than crowd-sourcing, they synthesized the dataset to generate a larger number of training examples, control the difficulty level and reduce training time.

AI Interview

Google Brain Simplifies Network Learning Dynamics Characterization Under Gradient Descent

Machine learning models based on deep neural networks have achieved unprecedented performance on many tasks. These models are generally considered to be complex systems and difficult to analyze theoretically. Also, since it’s usually a high-dimensional non-convex loss surface which governs the optimization process, it is very challenging to describe the gradient-based dynamics of these models during training.