A research team from University Medical Center Freiburg, ML Collective, and Google Brain introduces SimpleBits — an information-reduction method that learns to synthesize simplified inputs that contain less information yet remain informative for the task, providing a new approach for exploring the basis of network decisions.
A team from Google Research, University of Pennsylvania and Cornell University proposes a principled perspective to filter out common memorization for LMs, introducing “counterfactual memorization” to measure the expected change in a model’s prediction and distinguish “rare” (episodic) memorization from “common” (semantic) memorization in neural LMs.
A research team from UC Davis, Microsoft Research and Johns Hopkins University extends work on training massive amounts of linguistic data to reveal the grammatical structures in their representations to the domain of mathematical reasoning, showing that both the standard transformer and the TP-Transformer can compose the meanings of mathematical symbols based on their structured relationships.
A research team from ETH Zürich presents an overview of priors for (deep) Gaussian processes, variational autoencoders and Bayesian neural networks. The researchers propose that well-chosen priors can achieve theoretical and empirical properties such as uncertainty estimation, model selection and optimal decision support; and provide guidance on how to choose them.
Twitter Chief Scientist Michael Bronstein, Joan Bruna from New York University, Taco Cohen from Qualcomm AI and Petar Veličković from DeepMind publish a paper that aims to geometrically unify the typical architectures of CNNs, GNNs, LSTMs, Transformers, etc. from the perspective of symmetry and invariance to build an “Erlangen Programme” for deep neural networks.
Researchers from Carnegie Mellon University, the University of Texas at Austin and Facebook AI propose a novel paradigm to optimize widths for each CNN layer. The method is compatible across various width optimization algorithms and networks and achieves up to a 320x reduction in width optimization overhead without compromising top-1 accuracy on ImageNet.
A research team from Google and the University of California, Berkeley calculates the energy use and carbon footprint of large-scale models T5, Meena, GShard, Switch Transformer and GPT-3, and identifies methods and publication guidelines that could help reduce their CO2e footprint.
With AI models gaining power and momentum across a number of industries in recent years, meteorological researchers are now applying the tech in satellite data processing, nowcasting, typhoon and extreme weather forecasting and other business and environmental analytics areas.
To enable both content creators and end users to seriously restyle their apps’ interfaces while maintaining content detail clarity essential to their usability, researchers from Stanford have proposed ImagineNet, a novel and powerful new tool for interface customisation.
To help users design and tune machine learning models, neural network architectures or complex system parameters in an efficient and automatic way, in 2017 Microsoft Research began developing its Neural Network Intelligence (NNI) AutoML toolkit, open-sourcing v1.0 version in 2018.
DeepMind trained and tested its neural model by first collecting a dataset consisting of different types of mathematics problems. Rather than crowd-sourcing, they synthesized the dataset to generate a larger number of training examples, control the difficulty level and reduce training time.
Andrew Brock, first author of the high-profile research paper Large Scale GAN Training for High Fidelity Natural Image Synthesis (aka “BigGAN”), has posted a GitHub repository of an unofficial PyTorch BigGAN implementation that requires only 4-8 GPUs to train the model.
Facing the incomplete information environment, the asynchronous neural virtual self-play (ANFSP) method allows AI to learn to generate optimal decisions in multiple virtual environments. The approach has performed well in Texas Hold’em and multiplayer FPS video games.
Machine learning models based on deep neural networks have achieved unprecedented performance on many tasks. These models are generally considered to be complex systems and difficult to analyze theoretically. Also, since it’s usually a high-dimensional non-convex loss surface which governs the optimization process, it is very challenging to describe the gradient-based dynamics of these models during training.