Neural Networks

by Synced 2022-04-04 5

Training Compute-Optimal Large Language Models: DeepMind’s 70B Parameter Chinchilla Outperforms 530B Parameter Megatron-Turing

In the new paper Training Compute-Optimal Large Language Models, a DeepMind research team posits that current large language models are significantly undertrained and, based on empirical outcomes of over 400 training runs, proposes three predictive approaches for optimally setting model size and training duration.

by Synced 2022-03-01 1

AI Machine Learning & Data Science Research

Cornell U & Google Brain’s FLASH Yields High Transformer Quality in Linear Time

A research team from Cornell University and Google Brain introduces FLASH, a model family that achieves quality on par with fully augmented transformers while maintaining linear scalability over the context size on modern accelerators.

by Synced 2022-02-28 0

AI Machine Learning & Data Science Research

Princeton U’s DataMUX Enables DNNs to Simultaneously and Accurately Process up to 40 Input Instances With Limited Computational Overhead

In the new paper DataMUX: Data Multiplexing for Neural Networks, a Princeton University research team proposes Data Multiplexing (DataMUX). The novel technique enables neural networks to process multiple inputs simultaneously and generate accurate predictions, increasing model throughput with minimal additional memory requirements.

by Synced 2022-02-23 1

AI Computer Vision & Graphics Machine Learning & Data Science Research

Tsinghua & NKU’s Visual Attention Network Combines the Advantages of Convolution and Self-Attention, Achieves SOTA Performance on CV Tasks

In the new paper Visual Attention Network, a research team from Tsinghua University and Nankai University introduces a novel large kernel attention (LKA) mechanism for an extremely simple and efficient Visual Attention Network (VAN) that significantly outperforms state-of-the-art vision transformers and convolutional neural networks on various computer vision tasks.

by Synced 2022-02-17 8

AI Machine Learning & Data Science Research

DeepMind & UCL Propose Neural Population Learning: An Efficient and General Framework That Learns Strategically Diverse Policies for Real-World Games

A research team from DeepMind and University College London proposes Neural Population Learning (NeuPL), an efficient and general framework that learns and represents diverse policies in symmetric zero-sum games within a single conditional network.

by Synced 2022-02-03 0

AI Machine Learning & Data Science Nature Language Tech Research

Microsoft & NVIDIA Leverage DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest Monolithic Language Model

A research team from Microsoft and NVIDIA leverages the NVIDIA Megatron-LM and Microsoft’s DeepSpeed to create an efficient and scalable 3D parallel system that combines data, pipeline, and tensor-slicing based parallelism, achieving superior zero-, one-, and few-shot learning accuracies and new state-of-the-art results on NLP benchmarks.

by Synced 2022-01-27 0

AI Machine Learning & Data Science Research

Yann LeCun Team’s Neural Manifold Clustering and Embedding Method Surpasses High-Dimensional Clustering Algorithm Benchmarks

A team from UC Berkeley and Facebook AI Research proposes a Neural Manifold Clustering and Embedding (NMCE) method for general-purpose manifold clustering that significantly outperforms autoencoder-based deep subspace clustering approaches.

by Synced 2022-01-04 1

AI Machine Learning & Data Science Popular Research

A Neural Network Solves, Grades & Generates University-Level Mathematics Problems by Program Synthesis

In the new paper A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More, a research team from MIT, Columbia University, Harvard University and University of Waterloo proposes a neural network that can solve university-level mathematics problems via program synthesis.

by Synced 2021-12-03 2

AI Machine Learning & Data Science Research

Warsaw U, Google & OpenAI’s Terraformer Achieves a 37x Speedup Over Dense Baselines on 17B Transformer Decoding

In the new paper Sparse is Enough in Scaling Transformers, a research team from the University of Warsaw, Google Research and OpenAI proposes Scaling Transformers, a family of novel transformers that leverage sparse layers to scale efficiently and perform unbatched decoding much faster than original transformers, enabling fast inference on long sequences even with limited memory.

by Synced 2021-12-01 1

AI Machine Learning & Data Science Research

NeurIPS 2021 Announces Its 6 Outstanding Paper Awards, 2 Datasets and Benchmarks Track Best Paper Awards, and the Test of Time Award

The NeurIPS 2021 organizing committee has announced its paper awards, with six submissions receiving Outstanding Paper Awards, two papers recognized in the new Datasets and Benchmarks Track Best Paper Awards category, and one Test of Time Award.

by Synced 2021-11-16 0

AI Machine Learning & Data Science Research

Google Brain & Radboud U ‘Dive Into Chaos’ to Show Gradients Are Not All You Need in Dynamical Systems

In the new paper Gradients Are Not All You Need, a Google Brain and Radboud University research team discusses a “particularly sinister” chaos-based failure mode that appears in a variety of differentiable circumstances, ranging from recurrent neural networks and numerical physics simulation to training learned optimizers.

by Synced 2021-11-12 1

AI Machine Learning & Data Science Research

DeepMind’s One Pass ImageNet: A New Benchmark for Resource Efficiency in Deep Learning

A DeepMind research team presents the One Pass ImageNet (OPIN) problem, designed to study the space and compute efficiency of deep learning in a streaming setting with constrained data storage and to develop model training systems where each example is passed to the system only once.

by Synced 2021-11-10 3

AI Machine Learning & Data Science Research

Microsoft India Proposes Varuna: Scalable, Low-Cost Training of Massive Deep Learning Models

A Microsoft Research India team presents Varuna, a system for training massive deep learning models on commodity networking that eliminates the need for specialized hyperclusters and alleviates the cost, scale, and resource utilization challenges of deep learning model training.

by Synced 2021-11-04 2

AI Machine Learning & Data Science Research

Washington U & Google Study Reveals How Attention Matrices Are Formed in Encoder-Decoder Architectures

In the new paper Understanding How Encoder-Decoder Architectures Attend, researchers from the University of Washington, Google Blueshift Team and Google Brain Team propose a method for decomposing hidden states over a sequence into temporal- and input-driven components, revealing how attention matrices are formed in encoder-decoder networks.

by Synced 2021-10-22 0

AI Machine Learning & Data Science Research

Deeper Is Not Necessarily Better: Princeton U & Intel’s 12-Layer Parallel Networks Achieve Performance Competitive With SOTA Deep Networks

In the new paper Non-deep Networks, a research team from Princeton University and Intel Labs argues it is possible to achieve high performance with “non-deep” neural networks, presenting ParNet (Parallel Networks), a novel 12-layer architecture that achieves performance competitive with its state-of-the-art deep counterparts.

by Synced 2021-10-15 4

AI Machine Learning & Data Science Research

Google Proposes ARDMs: Efficient Autoregressive Models That Learn to Generate in any Order

A Google Research team introduces Autoregressive Diffusion Models (ARDMs), a model class encompassing and generalizing order-agnostic autoregressive models and discrete diffusion models that can generate variables in an arbitrary order and upscale variables.

by Synced 2021-10-14 1

AI Machine Learning & Data Science Research

Google Researchers Explore the Limits of Large-Scale Model Pretraining

A Google Research team conducts a systematic exploration comprising more than 4800 experiments on Vision Transformers, MLP-Mixers and ResNets with parameters ranging from 10 million to 10 billion, evaluated on more than 20 downstream image recognition tasks, aiming to capture the nonlinear relationships between performance on upstream and downstream tasks.

by Synced 2021-10-12 1

AI Computer Vision & Graphics Machine Learning & Data Science Research

Are Patches All You Need? New Study Proposes Patches Are Behind Vision Transformers’ Strong Performance

A research team proposes ConvMixer, an extremely simple model designed to support the argument that the impressive performance of vision transformers (ViTs) is mainly attributable to their use of patches as the input representation. The study shows that ConvMixer can outperform ViTs, MLP-Mixers and classical vision models.

by Synced 2021-10-05 1

AI Machine Learning & Data Science Research

DeepMind’s FIRE PBT: Automated Hyperparameter Tuning With Faster Model Training and Better Final Performance

A DeepMind research team proposes Faster Improvement Rate PBT (FIRE PBT) for Population Based Training (PBT), an automated hyperparameter tuning method for neural network training. The novel approach achieves faster improvement rates and better long-term performance.

by Synced 2021-09-07 2

AI Machine Learning & Data Science Research

Swiss AI Lab Uses Simple Tricks to Dramatically Improve Transformers’ Systematic Generalization

A research team from The Swiss AI Lab IDSIA significantly improves the systematic generalization of transformer architectures, achieving accuracy up to 85 percent on the PCFG productivity split, and up to 81 percent on COGS.

by Synced 2021-08-25 3

AI Machine Learning & Data Science Nature Language Tech Research

Apple Neural TTS System Study: Combining Speakers of Multiple Languages to Improve Synthetic Voice Quality

An Apple research team explores multiple architectures and training procedures to develop a novel multi-speaker and multi-lingual neural TTS system. The study combines speech from 30 speakers from 15 locales in 8 languages, and demonstrates that for the vast majority of voices, such multi-lingual and multi-speaker models can yield better quality than single speaker models.

by Synced 2021-08-19 3

AI Machine Learning & Data Science Research

100+ Stanford Researchers Publish 200+ Page Paper on the AI Paradigm Shift Introduced by Large-Scale Models

In a 200+ page paper, Percy Liang, Fei-Fei Li, and over 100 other researchers from the Stanford University Center for Research on Foundation Models (CRFM) systematically describe the opportunities and risks of large-scale pretrained “foundation” models. The unique study aims to provide a clearer understanding of how these models work, when and how they fail, and the various capabilities provided by their emergent properties.

by Synced 2021-08-18 5

AI Machine Learning & Data Science Research

Logic Explained Deep Neural Networks: A General Approach to Explainable AI

A research team from Università di Firenze, Università di Siena, University of Cambridge and Universitè Côte d’Azur proposes a general approach to explainable artificial intelligence (XAI) in neural architectures, designing interpretable deep learning models called Logic Explained Networks (LENs). The novel approach yields better performance than established white-box models while providing more compact and meaningful explanations.

by Synced 2021-08-13 3

AI Machine Learning & Data Science Research

Poison Ink: A Stealthy, Robust, General, Invisible and Flexible Backdoor Attack Method

A research team from the University of Science and Technology of China, Microsoft Cloud AI, City University of Hong Kong and Wormpex AI Research propose a robust and invisible backdoor attack called “Poison Ink” and demonstrates its immunity to state-of-the-art defence techniques.

by Synced 2021-08-11 2

AI Machine Learning & Data Science Research

Tokyo U & Preferred Networks Propose a Fast Estimation Method for the Stability of Ensemble Feature Selectors

A research team from Tokyo University and Preferred Networks proposes a fast simulation-based method for estimating the stability of ensemble selectors.

by Synced 2021-08-10 3

AI Machine Learning & Data Science Research

Novel Feature Importance-Aware Transferable Adversarial Attacks Dramatically Improve Transferability

A research team from Zhejiang University, Wuhan University and Adobe Research proposes Feature Importance-Aware Attacks (FIA) that drastically improve the transferability of adversarial examples, achieving superior performance compared to state-of-the-art transferable attacks.

by Synced 2021-08-09 4

AI Machine Learning & Data Science Research

DeepMind’s Perceiver IO: A General Architecture for a Wide Variety of Inputs & Outputs

A DeepMind research team proposes Perceiver IO, a single network that can easily integrate and transform arbitrary information for arbitrary tasks while scaling linearly with both input and output sizes. The general architecture achieves outstanding results on tasks with highly structured output spaces, such as natural language and visual understanding.

by Synced 2021-08-03 7

AI Machine Learning & Data Science Research

DeepMind & Google Use Neural Networks to Solve Mixed Integer Programs

A team from DeepMind and Google Research leverages neural networks to automatically construct effective heuristics from a dataset for mixed integer programming (MIP) problems. The approach significantly outperforms classical MIP solver techniques.

by Synced 2021-07-29 2

AI Machine Learning & Data Science Research

Google & Northwestern U Present Provably Efficient Learning Algorithms for Neural Networks

A research team from Google Research and Northwestern University presents polynomial time and sample-efficient algorithms for learning an unknown depth-2 feedforward neural network with general ReLU activations, aiming to provide insights into whether efficient algorithms exist for learning ReLU networks.

by Synced 2021-07-26 1

AI Machine Learning & Data Science Research

DeepMind’s Epistemic Neural Networks Open New Avenues for Uncertainty Modelling in Large and Complex DL Systems

A research team from DeepMind presents epistemic neural networks (ENNs) as an interface for uncertainty modelling in deep learning, and proposes the KL divergence from a target distribution as a precise metric to evaluate ENNs.

by Synced 2021-07-20 6

AI Machine Learning & Data Science Popular Research

DeepMind’s AlphaFold2 Predicts Protein Structures with Atomic-Level Accuracy

In a new paper published in the prestigious scientific journal Nature, DeepMind presents AlphaFold2, a redesigned neural-network system based on last year’s AlphaFold that can predict protein structures with atomic-level accuracy.

by Synced 2021-07-06 3

AI Computer Vision & Graphics Machine Learning & Data Science Popular Research

Facebook & UC Berkeley Substitute a Convolutional Stem to Dramatically Boost Vision Transformers’ Optimization Stability

A research team from Facebook AI and UC Berkeley finds a solution for vision transformers’ optimization instability problem by simply using a standard, lightweight convolutional stem for ViT models. The approach dramatically increases optimizer stability and improves peak performance without sacrificing computation efficiency.

by Synced 2021-06-24 2

AI Machine Learning & Data Science Research

Google Survey Explores Methods for Making DL Models ‘Smaller, Faster, and Better’

Researchers from Google conduct a survey on how to make Deep Learning models smaller, faster, and better. The team focuses on core areas of model efficiency, from modelling techniques to hardware support, and open-sources an experiment-based guide and code to help practitioners optimize their model training and deployment.

by Synced 2021-06-16 1

AI Machine Learning & Data Science Research

Bengio Team Proposes Flow Network-Based Generative Models That Learn a Stochastic Policy From a Sequence of Actions

A research team from Mila, McGill University, Université de Montréal, DeepMind and Microsoft proposes GFlowNet, a novel flow network-based generative method that can turn a given positive reward into a generative policy that samples with a probability proportional to the return.v

by Synced 2021-06-10 2

AI Machine Learning & Data Science Research

IEEE Publishes Comprehensive Survey of Bottom-Up and Top-Down Neural Processing System Design

An IEEE team provides a comprehensive overview of the bottom-up and top-down design approaches toward neuromorphic intelligence, highlighting the different levels of granularity present in existing silicon implementations and assessing the benefits of the different circuit design styles in neural processing systems.

by Synced 2021-05-17 0

AI Machine Learning & Data Science Research

Google Presents New Parallelization Paradigm GSPMD for common ML Computation Graphs: Constant Compilation time with Increasing Devices

A research team from Google proposes GSPMD, an automatic parallelism system for ML computation graphs that uses simple tensor sharding annotations to achieve different parallelism paradigms in a unified way, including data parallelism, within-layer model parallelism, spatial partitioning, weight-update sharding, optimizer-state sharding and pipeline parallelism.

by Synced 2021-04-16 5

AI AIoT Machine Learning & Data Science Research

ETH Zurich Leverages Spiking Neural Networks To Build Ultra-Low-Power Neuromorphic Processors

A research team from ETH Zurich leverages existing spike-based learning circuits to propose a biologically plausible architecture that is highly successful in classifying distinct and complex spatio-temporal spike patterns. The work contributes to the design of ultra-low-power mixed-signal neuromorphic processing systems capable of distinguishing spatio-temporal patterns in spiking activity.

by Synced 2021-04-07 7

AI Machine Learning & Data Science Research

DeepMind, Microsoft, Allen AI & UW Researchers Convert Pretrained Transformers into RNNs, Lowering Memory Cost While Retaining High Accuracy

A research team from University of Washington, Microsoft, DeepMind and Allen Institute for AI develop a method to convert pretrained transformers into efficient RNNs. The Transformer-to-RNN (T2R) approach speeds up generation and reduces memory cost.

by Synced 2021-03-16 2

AI Machine Learning & Data Science Research

Model Scaling That’s Both Accurate and Fast: Facebook AI Proposes Novel Scaling Analysis Framework and Strategy

A Facebook AI research team explores strategies for convolutional neural network scaling, aiming to provide a framework for analyzing scaling strategies under various computational constraints.

by Synced 2021-02-26 8

AI Machine Learning & Data Science Popular Research

Better Than Capsules? Geoffrey Hinton’s GLOM Idea Represents Part-Whole Hierarchies in Neural Networks

A research team lead by Geoffrey Hinton has created an imaginary vision system called GLOM that enables neural networks with fixed architecture to parse an image into a part-whole hierarchy with different structures for each image.