A research team from DeepMind, Mila – University of Montreal and Google Brain proposes a neural network architecture that learns the graph structure of observational and/or interventional data via supervised training on synthetic graphs, making causal induction a black-box problem that generalizes well to new synthetic and naturalistic graphs.
In the new paper A Modern Self-Referential Weight Matrix That Learns to Modify Itself, a research team from The Swiss AI Lab, IDSIA, University of Lugano (USI) & SUPSI, and King Abdullah University of Science and Technology (KAUST) presents a scalable self-referential weight matrix (SRWM) that leverages outer products and the delta update rule to update and improve itself.
A Google Research team further explores the scaling approach for improving language modelling, leveraging the new Pathways distributed ML system to train a 540 billion parameter autoregressive transformer, Pathways Language Model (PaLM), that achieves state-of-the-art few-shot performance.
A research team from Sapienza University and OpenAI introduces an explanatory learning procedure that enables machines to understand existing explanations from symbolic sequences and create new explanations for unexplained phenomena, and further proposes Critical Rationalist Network (CRN) models for discovering explanations for novel phenomena.
A research team from University Medical Center Freiburg, ML Collective, and Google Brain introduces SimpleBits — an information-reduction method that learns to synthesize simplified inputs that contain less information yet remain informative for the task, providing a new approach for exploring the basis of network decisions.
A team from Google Research, University of Pennsylvania and Cornell University proposes a principled perspective to filter out common memorization for LMs, introducing “counterfactual memorization” to measure the expected change in a model’s prediction and distinguish “rare” (episodic) memorization from “common” (semantic) memorization in neural LMs.
A research team from UC Davis, Microsoft Research and Johns Hopkins University extends work on training massive amounts of linguistic data to reveal the grammatical structures in their representations to the domain of mathematical reasoning, showing that both the standard transformer and the TP-Transformer can compose the meanings of mathematical symbols based on their structured relationships.
A research team from ETH Zürich presents an overview of priors for (deep) Gaussian processes, variational autoencoders and Bayesian neural networks. The researchers propose that well-chosen priors can achieve theoretical and empirical properties such as uncertainty estimation, model selection and optimal decision support; and provide guidance on how to choose them.
Twitter Chief Scientist Michael Bronstein, Joan Bruna from New York University, Taco Cohen from Qualcomm AI and Petar Veličković from DeepMind publish a paper that aims to geometrically unify the typical architectures of CNNs, GNNs, LSTMs, Transformers, etc. from the perspective of symmetry and invariance to build an “Erlangen Programme” for deep neural networks.
Researchers from Carnegie Mellon University, the University of Texas at Austin and Facebook AI propose a novel paradigm to optimize widths for each CNN layer. The method is compatible across various width optimization algorithms and networks and achieves up to a 320x reduction in width optimization overhead without compromising top-1 accuracy on ImageNet.
A research team from Google and the University of California, Berkeley calculates the energy use and carbon footprint of large-scale models T5, Meena, GShard, Switch Transformer and GPT-3, and identifies methods and publication guidelines that could help reduce their CO2e footprint.
With AI models gaining power and momentum across a number of industries in recent years, meteorological researchers are now applying the tech in satellite data processing, nowcasting, typhoon and extreme weather forecasting and other business and environmental analytics areas.
To help users design and tune machine learning models, neural network architectures or complex system parameters in an efficient and automatic way, in 2017 Microsoft Research began developing its Neural Network Intelligence (NNI) AutoML toolkit, open-sourcing v1.0 version in 2018.
DeepMind trained and tested its neural model by first collecting a dataset consisting of different types of mathematics problems. Rather than crowd-sourcing, they synthesized the dataset to generate a larger number of training examples, control the difficulty level and reduce training time.
Facing the incomplete information environment, the asynchronous neural virtual self-play (ANFSP) method allows AI to learn to generate optimal decisions in multiple virtual environments. The approach has performed well in Texas Hold’em and multiplayer FPS video games.