A Tsinghua University research team proposes Stochastic Scheduled SAM (SS-SAM), a novel and efficient DNN training scheme that achieves comparable or better model training performance with much lower computation cost compared to baseline sharpness-aware minimization (SAM) training schema.
A DeepMind research team argues that the mathematical description of symmetries in group theory is an important foundation that determines the structure of the universe, constrains the nature of natural tasks, and consequently shapes both biological and artificial intelligence. The study proposes symmetry transformations as a fundamental principle for defining what makes good representations.
A Google research team addresses conventional transformers’ resource-heavy training and fine-tuning requirements for learning new knowledge, proposing Memorizing Transformers as a step toward language models that can simply read and memorize new data at inference time for immediate knowledge acquisition.
A team from Google Research and the Swiss AI Lab IDSIA proposes the Block-Recurrent Transformer, a novel long-sequence processing approach that has the same computation time and parameter count costs as a conventional transformer layer but achieves significant perplexity improvements in language modelling tasks over very long sequences.
Researchers from Meta AI and the State University of New York at Buffalo propose sparsely-activated all-MLP architectures (sMLPs) that achieve training efficiency improvements of up to 2x compared to transformer-based mixture-of-experts (MoE) architectures, transformers, and gMLP.
In the new paper Deep AutoAugment, a research team from Michigan State University and Amazon Web Services proposes Deep AutoAugment (DeepAA), a fully automated multi-layer data augmentation search method that eliminates the need for hand-crafted default transformations.
A research team from the National University of Singapore, HPC-AI Technology Inc., Helixon and Shanghai Jiao Tong University proposes FastFold, a highly efficient protein structure prediction model for training and inference that reduces AlphaFold 2’s training time from 11 days to 67 hours.
An Idiap Research Institute team proposes a novel multi-layer perceptron (MLP) model, HyperMixer, as a Green AI alternative to transformers. HyperMixer achieves comparable performance with substantially lower costs in terms of processing time, training data and hyperparameter tuning.
A research team from DeepMind, Ca’ Foscari University of Venice, University of Oxford and Athens University of Economics and Business introduces Ithaca, a deep neural network (DNN) designed for textual restoration and geographical and chronological attribution of ancient Greek inscriptions.
In the new paper Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer, Microsoft and OpenAI researchers propose µTransfer, a method that leverages Maximal Update Parametrization (µP) to zero-shot transfer hyperparameters from small models and obtain near-optimal parameters on large models without directly tuning them.
In the new paper AutoDIME: Automatic Design of Interesting Multi-Agent Environments, an OpenAI research team explores automatic environment design for multi-agent environments using an RL-trained teacher that samples environments to maximize student learning. The work demonstrates that intrinsic teacher rewards are a promising approach for automating both single and multi-agent environment design.
In the new paper Learning Robust Real-Time Cultural Transmission Without Human Data, a DeepMind research team proposes a procedure for training artificially intelligent agents capable of flexible, high-recall, robust real-time cultural transmission from human co-players in a rich 3D physical simulation without using human data in the training pipeline.
A research team from the University of Washington, UC San Diego and Microsoft prototypes Tensor Query Processor (TQP), a query processor that runs atop tensor computation runtimes (TCRs) such as PyTorch, TVM, and ONNX Runtime, improving query execution time by up to 20x over CPU-only systems and up to 5x over specialized GPU solutions.
In the new paper DataMUX: Data Multiplexing for Neural Networks, a Princeton University research team proposes Data Multiplexing (DataMUX). The novel technique enables neural networks to process multiple inputs simultaneously and generate accurate predictions, increasing model throughput with minimal additional memory requirements.
DeepMind researchers propose Hierarchical Perceiver (HiP), a model that retains the original Perceiver’s ability to process arbitrary modalities but is faster, can scale up to even more inputs/outputs, reduces the need for input engineering, and improves both efficiency and accuracy on classical computer vision benchmarks.
In the new paper Visual Attention Network, a research team from Tsinghua University and Nankai University introduces a novel large kernel attention (LKA) mechanism for an extremely simple and efficient Visual Attention Network (VAN) that significantly outperforms state-of-the-art vision transformers and convolutional neural networks on various computer vision tasks.
A research team from the University of Hong Kong, Shanghai AI Lab, Huawei Noah’s Ark Lab and the University of Washington takes dataset generation methods via large-scale pretrained language models (PLMs) to the extreme with ZEROGEN, a flexible and efficient zero-shot learning framework via dataset generation.
A team from Facebook AI Research, UC Berkeley and UCLA proposes Online Decision Transformers (ODT), an RL algorithm based on sequence modelling that incorporates offline pretraining and online finetuning in a unified framework and achieves performance competitive with the state-of-the-art models on the D4RL benchmark.
A Google Research team proposes Masked Generative Image Transformer (MaskGIT), a novel image synthesis paradigm that uses a bidirectional transformer decoder. MaskGIT significantly outperforms state-of-the-art transformer models on the ImageNet dataset and accelerates autoregressive decoding by up to 64x.
A Google Brain research team introduces EvoJAX, a JAX-based, scalable, general-purpose, hardware-accelerated neuroevolution toolkit that enables neuroevolution algorithms to work with neural networks running in parallel across multiple TPU/GPUs and achieves significant training speedups.
A research team from UC Berkeley, Amazon Web Services, Google, Shanghai Jiao Tong University and Duke University proposes Alpa, a compiler system for distributed deep learning on GPU clusters that automatically generates parallelization plans that match or outperform hand-tuned model-parallel training systems even on the models they were designed for.
An OpenAI research team presents an expert iteration-based neural theorem prover capable of solving a curriculum of increasingly difficult mathematical problems (such as high-school olympiad-level problems) from a set of formal statements of sufficiently varied difficulty and without the need for associated ground-truth proofs.
A research team from Microsoft and NVIDIA leverages the NVIDIA Megatron-LM and Microsoft’s DeepSpeed to create an efficient and scalable 3D parallel system that combines data, pipeline, and tensor-slicing based parallelism, achieving superior zero-, one-, and few-shot learning accuracies and new state-of-the-art results on NLP benchmarks.
A research team from Mila, Québec Artificial Intelligence Institute, Université de Montréal, CIFAR and IVADO Labs challenges the assumption that task diversity will improve model performance in meta-learning, finding instead that repeating the same tasks over the training phase can achieve performance similar to models trained on uniform sampling.
A research team from Sapienza University and OpenAI introduces an explanatory learning procedure that enables machines to understand existing explanations from symbolic sequences and create new explanations for unexplained phenomena, and further proposes Critical Rationalist Network (CRN) models for discovering explanations for novel phenomena.
An OpenAI research team leverages reinforcement learning from human feedback (RLHF) to make significant progress on aligning language models with the users’ intentions. The proposed InstructGPT models are better at following instructions than GPT-3 while also more truthful and less toxic.
University of Illinois Urbana-Champaign and Google researchers introduce AutoDistill, an end-to-end fully automated model distillation framework that integrates model architecture exploration and multi-objective optimization for building hardware-efficient pretrained natural language processing models.
In the new paper Laplace Redux — Effortless Bayesian Deep Learning, a research team from the University of Cambridge, University of Tübingen, ETH Zurich and DeepMind conducts extensive experiments demonstrating that the Laplace approximation (LA) is a simple and cost-efficient yet competitive approximation method for inference in Bayesian deep learning.