Latest Posts

AI Machine Learning & Data Science Research

Meta AI’s Sparse All-MLP Model Doubles Training Efficiency Compared to Transformers

Researchers from Meta AI and the State University of New York at Buffalo propose sparsely-activated all-MLP architectures (sMLPs) that achieve training efficiency improvements of up to 2x compared to transformer-based mixture-of-experts (MoE) architectures, transformers, and gMLP.

AI Machine Learning & Data Science Research

Ithaca Paper Published in Nature: The First DNN Designed for Textual Restoration and Geographical and Chronological Attribution of Ancient Greek Inscriptions

A research team from DeepMind, Ca’ Foscari University of Venice, University of Oxford and Athens University of Economics and Business introduces Ithaca, a deep neural network (DNN) designed for textual restoration and geographical and chronological attribution of ancient Greek inscriptions.

AI Machine Learning & Data Science Research

Microsoft & OpenAI’s µTransfer Zero-Shot Hyperparameter Transfer Method Tunes GPT-3’s Hyperparameters on a Single GPU

In the new paper Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer, Microsoft and OpenAI researchers propose µTransfer, a method that leverages Maximal Update Parametrization (µP) to zero-shot transfer hyperparameters from small models and obtain near-optimal parameters on large models without directly tuning them.

AI Machine Learning & Data Science Research

OpenAI’s AutoDIME: Automating Multi-Agent Environment Design for RL Agents

In the new paper AutoDIME: Automatic Design of Interesting Multi-Agent Environments, an OpenAI research team explores automatic environment design for multi-agent environments using an RL-trained teacher that samples environments to maximize student learning. The work demonstrates that intrinsic teacher rewards are a promising approach for automating both single and multi-agent environment design.

AI Machine Learning & Data Science Research

DeepMind Trains AI Agents Capable of Robust Real-time Cultural Transmission Without Human Data

In the new paper Learning Robust Real-Time Cultural Transmission Without Human Data, a DeepMind research team proposes a procedure for training artificially intelligent agents capable of flexible, high-recall, robust real-time cultural transmission from human co-players in a rich 3D physical simulation without using human data in the training pipeline.

AI Machine Learning & Data Science Research

Meet TQP: The First Query Processor to Run On Tensor Computation Runtimes Delivers up to 20x Speedups Over CPU-Only Systems

A research team from the University of Washington, UC San Diego and Microsoft prototypes Tensor Query Processor (TQP), a query processor that runs atop tensor computation runtimes (TCRs) such as PyTorch, TVM, and ONNX Runtime, improving query execution time by up to 20x over CPU-only systems and up to 5x over specialized GPU solutions.

AI Machine Learning & Data Science Research

Princeton U’s DataMUX Enables DNNs to Simultaneously and Accurately Process up to 40 Input Instances With Limited Computational Overhead

In the new paper DataMUX: Data Multiplexing for Neural Networks, a Princeton University research team proposes Data Multiplexing (DataMUX). The novel technique enables neural networks to process multiple inputs simultaneously and generate accurate predictions, increasing model throughput with minimal additional memory requirements.

AI Computer Vision & Graphics Machine Learning & Data Science Research

DeepMind’s Upgraded Hierarchical Perceiver Is Faster, Scales to Larger Data Without Preprocessing, and Delivers Higher Resolution and Accuracy

DeepMind researchers propose Hierarchical Perceiver (HiP), a model that retains the original Perceiver’s ability to process arbitrary modalities but is faster, can scale up to even more inputs/outputs, reduces the need for input engineering, and improves both efficiency and accuracy on classical computer vision benchmarks.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Tsinghua & NKU’s Visual Attention Network Combines the Advantages of Convolution and Self-Attention, Achieves SOTA Performance on CV Tasks

In the new paper Visual Attention Network, a research team from Tsinghua University and Nankai University introduces a novel large kernel attention (LKA) mechanism for an extremely simple and efficient Visual Attention Network (VAN) that significantly outperforms state-of-the-art vision transformers and convolutional neural networks on various computer vision tasks.

AI Machine Learning & Data Science Research

Transformers Meet Online RL: New Study Unifies Offline Pretraining and Online Finetuning, Achieves SOTA Results

A team from Facebook AI Research, UC Berkeley and UCLA proposes Online Decision Transformers (ODT), an RL algorithm based on sequence modelling that incorporates offline pretraining and online finetuning in a unified framework and achieves performance competitive with the state-of-the-art models on the D4RL benchmark.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Google’s MaskGIT Outperforms SOTA Transformer Models on Conditional Image Generation and Accelerates Autoregressive Decoding by up to 64x

A Google Research team proposes Masked Generative Image Transformer (MaskGIT), a novel image synthesis paradigm that uses a bidirectional transformer decoder. MaskGIT significantly outperforms state-of-the-art transformer models on the ImageNet dataset and accelerates autoregressive decoding by up to 64x.

AI Machine Learning & Data Science Research

Introducing Alpa: A Compiler Architecture for Automated Model-Parallel Distributed Training That Outperforms Hand-Tuned Strategies

A research team from UC Berkeley, Amazon Web Services, Google, Shanghai Jiao Tong University and Duke University proposes Alpa, a compiler system for distributed deep learning on GPU clusters that automatically generates parallelization plans that match or outperform hand-tuned model-parallel training systems even on the models they were designed for.

AI Machine Learning & Data Science Research

OpenAI’s Statement Curriculum Learning Method Cracks High School Olympiad Level Mathematics Problems

An OpenAI research team presents an expert iteration-based neural theorem prover capable of solving a curriculum of increasingly difficult mathematical problems (such as high-school olympiad-level problems) from a set of formal statements of sufficiently varied difficulty and without the need for associated ground-truth proofs.

AI Machine Learning & Data Science Nature Language Tech Research

Microsoft & NVIDIA Leverage DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest Monolithic Language Model

A research team from Microsoft and NVIDIA leverages the NVIDIA Megatron-LM and Microsoft’s DeepSpeed to create an efficient and scalable 3D parallel system that combines data, pipeline, and tensor-slicing based parallelism, achieving superior zero-, one-, and few-shot learning accuracies and new state-of-the-art results on NLP benchmarks.

AI Machine Learning & Data Science Nature Language Tech Research

Sapienza U & OpenAI Propose Explanatory Learning to Enable Machines to Understand and Create Explanations

A research team from Sapienza University and OpenAI introduces an explanatory learning procedure that enables machines to understand existing explanations from symbolic sequences and create new explanations for unexplained phenomena, and further proposes Critical Rationalist Network (CRN) models for discovering explanations for novel phenomena.

AI Machine Learning & Data Science Research

New Study Revisits Laplace Approximation, Validating It as an ‘Effortless’ Method for Bayesian Deep Learning

In the new paper Laplace Redux — Effortless Bayesian Deep Learning, a research team from the University of Cambridge, University of Tübingen, ETH Zurich and DeepMind conducts extensive experiments demonstrating that the Laplace approximation (LA) is a simple and cost-efficient yet competitive approximation method for inference in Bayesian deep learning.

AI Machine Learning & Data Science Research

Meet Hyper-Tune: New SOTA Efficient Distributed Automatic Hyperparameter Tuning at Scale

A research team from Peking University, ETH Zürich and Kuaishou Technology proposes Hyper-Tune, an efficient and robust distributed hyperparameter-tuning framework that features system optimizations such as automatic resource allocation, asynchronous scheduling and a multi-fidelity optimizer, and achieves state-of-the-art performance on multiple tuning tasks.

AI Machine Learning & Data Science Research

Less is More: Understanding Neural Network Decisions via Simplified Yet Informative Inputs

A research team from University Medical Center Freiburg, ML Collective, and Google Brain introduces SimpleBits — an information-reduction method that learns to synthesize simplified inputs that contain less information yet remain informative for the task, providing a new approach for exploring the basis of network decisions.