Tag: Neural Network

AI Machine Learning & Data Science Research

Colossal-AI Seamlessly Accelerates Large Models at Low Costs with Hugging Face

HPC-AI Tech’s flagship open-source and large-scale AI system, Colossal-AI, now allows Hugging Face users to seamlessly develop their ML models in a distributed and easy manner.

AI Machine Learning & Data Science Research

DeepMind, Mila & Google Brain Enable Generalization Capabilities for Causal Graph Structure Induction

A research team from DeepMind, Mila – University of Montreal and Google Brain proposes a neural network architecture that learns the graph structure of observational and/or interventional data via supervised training on synthetic graphs, making causal induction a black-box problem that generalizes well to new synthetic and naturalistic graphs.

AI Machine Learning & Data Science Popular Research

Toward Self-Improving Neural Networks: Schmidhuber Team’s Scalable Self-Referential Weight Matrix Learns to Modify Itself

In the new paper A Modern Self-Referential Weight Matrix That Learns to Modify Itself, a research team from The Swiss AI Lab, IDSIA, University of Lugano (USI) & SUPSI, and King Abdullah University of Science and Technology (KAUST) presents a scalable self-referential weight matrix (SRWM) that leverages outer products and the delta update rule to update and improve itself.

AI Machine Learning & Data Science Nature Language Tech Research

Sapienza U & OpenAI Propose Explanatory Learning to Enable Machines to Understand and Create Explanations

A research team from Sapienza University and OpenAI introduces an explanatory learning procedure that enables machines to understand existing explanations from symbolic sequences and create new explanations for unexplained phenomena, and further proposes Critical Rationalist Network (CRN) models for discovering explanations for novel phenomena.

AI Machine Learning & Data Science Research

Less is More: Understanding Neural Network Decisions via Simplified Yet Informative Inputs

A research team from University Medical Center Freiburg, ML Collective, and Google Brain introduces SimpleBits — an information-reduction method that learns to synthesize simplified inputs that contain less information yet remain informative for the task, providing a new approach for exploring the basis of network decisions.

AI Machine Learning & Data Science Research

Counterfactual Memorization in Language Models: Distinguishing Rare from Common Memorization

A team from Google Research, University of Pennsylvania and Cornell University proposes a principled perspective to filter out common memorization for LMs, introducing “counterfactual memorization” to measure the expected change in a model’s prediction and distinguish “rare” (episodic) memorization from “common” (semantic) memorization in neural LMs.

AI Machine Learning & Data Science Nature Language Tech Research

Study Shows Transformers Possess the Compositionality Power for Mathematical Reasoning

A research team from UC Davis, Microsoft Research and Johns Hopkins University extends work on training massive amounts of linguistic data to reveal the grammatical structures in their representations to the domain of mathematical reasoning, showing that both the standard transformer and the TP-Transformer can compose the meanings of mathematical symbols based on their structured relationships.

AI Machine Learning & Data Science Popular Research

ETH Zürich Identifies Priors That Boost Bayesian Deep Learning Models

A research team from ETH Zürich presents an overview of priors for (deep) Gaussian processes, variational autoencoders and Bayesian neural networks. The researchers propose that well-chosen priors can achieve theoretical and empirical properties such as uncertainty estimation, model selection and optimal decision support; and provide guidance on how to choose them.

AI Machine Learning & Data Science Popular Research

Bronstein, Bruna, Cohen and Velickovic Leverage the Erlangen Programme to Establish the Geometric Foundations of Deep Learning

Twitter Chief Scientist Michael Bronstein, Joan Bruna from New York University, Taco Cohen from Qualcomm AI and Petar Veličković from DeepMind publish a paper that aims to geometrically unify the typical architectures of CNNs, GNNs, LSTMs, Transformers, etc. from the perspective of symmetry and invariance to build an “Erlangen Programme” for deep neural networks.

AI Machine Learning & Data Science Research

CMU, UT Austin & Facebook’s CNN Layer Width Optimization Strategies Achieve 320x Overhead Reduction

Researchers from Carnegie Mellon University, the University of Texas at Austin and Facebook AI propose a novel paradigm to optimize widths for each CNN layer. The method is compatible across various width optimization algorithms and networks and achieves up to a 320x reduction in width optimization overhead without compromising top-1 accuracy on ImageNet.

AI Research

DeepMind AI Flunks High School Math Test

DeepMind trained and tested its neural model by first collecting a dataset consisting of different types of mathematics problems. Rather than crowd-sourcing, they synthesized the dataset to generate a larger number of training examples, control the difficulty level and reduce training time.

AI Research

BigGAN Trained With Only 4 GPUs!

Andrew Brock, first author of the high-profile research paper Large Scale GAN Training for High Fidelity Natural Image Synthesis (aka “BigGAN”), has posted a GitHub repository of an unofficial PyTorch BigGAN implementation that requires only 4-8 GPUs to train the model.