In the new paper Confident Adaptive Language Modeling, a research team from Google and MIT presents Confident Adaptive Language Modeling (CALM), a framework that dynamically allocates different amounts of compute to each input and generation timestep, achieving up to 3x speedups while maintaining high performance.
Amazon has introduced the latest version of their Sockeye toolkit for the efficient training of stronger and faster neural machine translation (NMT) models. Sockeye 3 achieves speeds up to 126 percent faster than other PyTorch implementations on GPUs and up to 292 percent faster on CPUs.
In the new paper TF-GNN: Graph Neural Networks in TensorFlow, a research team from Google Core ML, Google Research, and DeepMind open-sources the TensorFlow GNN (TF-GNN) scalable library, which leverages heterogeneous relational data to create graph neural network models.
In the new paper YOLOv7: Trainable Bag-Of-Freebies Sets New State-Of-The-Art for Real-Time Object Detectors, an Academia Sinica research team releases YOLOv7. This latest YOLO version introduces novel “extend” and “compound scaling” methods that effectively utilize parameters and computation; and surpasses all known real-time object detectors in speed and accuracy.
In the new paper Neural Networks and the Chomsky Hierarchy, DeepMind researchers examine generalization in neural network architectures and whether insights from the theory of computation and the Chomsky hierarchy can predict the practical limits of network generalization.
In the new paper CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning, a Salesforce Research team presents CodeRL, a novel framework for program synthesis tasks that employs pretrained language models (LMs) and deep reinforcement learning (RL) and achieves state-of-the-art performance on the challenging APPS benchmark while also demonstrating impressive zero-shot transfer capabilities.
In the new paper Neural Collapse: A Review on Modelling Principles and Generalization, researchers from New York University analyze Neural Collapse (NC) and present a thought model to explain the effects of variance collapse, aiming at a better understanding of the generalization capabilities of neural networks.
In the new paper Forecasting Future World Events with Neural Networks, a research team from UC Berkeley, MIT, UIUC, and the University of Oxford presents Autocast, a dataset containing thousands of forecasting questions and an accompanying news corpus for measuring neural network models’ automatic forecasting capabilities.
In the new paper DayDreamer: World Models for Physical Robot Learning, researchers from the University of California, Berkeley leverage recent advances in the Dreamer world model to enable online reinforcement learning for robot training without simulators or demonstrations, establishing a strong baseline for efficient real-world robotic learning.
In the new paper p-Meta: Towards On-device Deep Model Adaptation, a research team from ETH Zurich, Singapore Management University and Beihang University proposes p-Meta, a novel meta-learning method for data- and memory-efficient on-device adaption of deep neural networks for IoT applications.
In the new paper Global Context Vision Transformers, an NVIDIA research team proposes the Global Context Vision Transformer, a novel yet simple hierarchical ViT architecture comprising global self-attention and token generation modules that enables the efficient modelling of both short- and long-range dependencies without costly compute operations while achieving SOTA results across various computer vision tasks.
In the new paper ReStructured Pre-training, a Carnegie Mellon University research team proposes “reStructured Pre-training” (RST), a novel NLP paradigm that pretrains models over valuable restructured data. The team’s resulting QIN system scores 40 points higher than the student average on the Gaokao-English Exam and 15 points higher than GPT-3 with 1/16 of the parameters.
In the new paper Lossy Compression with Gaussian Diffusion, a Google Research team presents DiffC, a novel and simple lossy compression method that relies only on an unconditionally trained diffusion generative model and achieves state-of-the-art image compression results despite lacking an encoder transform.
In the new paper Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks, a research team from the Allen Institute for AI and the University of Washington introduces UNIFIED-IO, a neural model that achieves strong performance across a wide variety of vision, language, and multi-modal tasks without task- or modality-specific branches or fine-tuning.
In the new paper Evolution Through Large Models, an OpenAI research team shows that large-scale language models (LLMs) trained to generate modern programming language can suggest intelligent mutations that can be leveraged to realize dramatically improved mutation operators for genetic programming.
In the new paper GoodBye WaveNet — A Language Model for Raw Audio with Context of 1/2 Million Samples, Stanford University researcher Prateek Verma presents a generative auto-regressive architecture that models audio waveforms over contexts greater than 500,000 samples and outperforms state-of-the-art WaveNet baselines.
In the new paper Large-Scale Retrieval for Reinforcement Learning, a DeepMind research team dramatically expands the information accessible to reinforcement learning (RL) agents, enabling them to attend to tens of millions of information pieces, incorporate new information without retraining, and learn decision making in an end-to-end manner.
In the new paper LegoNN: Building Modular Encoder-Decoder Models, Meta AI researchers propose LegoNN, a procedure for building encoder-decoder architectures with decoder modules that can be shared across different tasks without finetuning or significant performance reductions.
In the new paper VCT: A Video Compression Transformer, a Google Research team presents an elegantly simple but powerful video compression transformer (VCT) that does not require architectural biases and priors and learns totally from data without any hand-crafting. VCT is easy to implement and outperforms conventional video compression approaches.
In the new paper Toward a Realistic Model of Speech Processing in the Brain with Self-supervised Learning, researchers show that self-supervised architectures such as Wav2Vec 2.0 can learn brain-like representations from as little as 600 hours of unlabelled speech; and can also learn sound-generic and speech- and language-specific representations similar to those of the prefrontal and temporal cortices.
In the new paper Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models, 444 authors from 132 institutions introduce Beyond the Imitation Game (BIG-bench), a large-scale, extremely difficult and diverse benchmark that includes 204 tasks for predicting the potentially transformative effects of large language models.
In the new paper Neural Diffusion Processes, a research team from the University of Cambridge, Secondmind, and Google Research presents Neural Diffusion Processes (NDPs), a novel framework that learns to sample from rich distributions over functions at a lower computational cost than the true Bayesian posterior of a conventional Gaussian process.
In the new paper Is a Modular Architecture Enough?, a research team from Mila and the Université de Montréal conducts a rigorous and thorough quantitative assessment of common modular architectures that reveals the benefits of modularity and sparsity for deep neural networks and the sub-optimality of existing end-to-end learned modular systems.
In the new paper Extreme Compression for Pre-trained Transformers Made Simple and Efficient, a Microsoft research team introduces XTC, a simple yet effective extreme compression pipeline for pretrained transformers that can achieve state-of-the-art results while reducing model size by 50x.
In the new paper Rare Gems: Finding Lottery Tickets at Initialization, a research team from Carnegie Mellon University, MBZUAI, Petuum, Inc and the University of Wisconsin-Madison proposes GEM-MINER, an algorithm that finds sparse subnetworks at initialization trainable to accuracy that is comparable or better than iterative magnitude pruning (IMP) with warm-up.
In the new paper Factory: Fast Contact for Robotic Assembly, a research team from NVIDIA and the University of Washington introduces Factory, a set of physics simulation methods and robot learning tools for simulating contact-rich interactions in assembly with high accuracy, efficiency, and robustness.
In the new paper UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes, a Google Brain research team proposes UViM, a unified approach that leverages language modelling and discrete representation learning to enable the modelling of a wide range of computer vision tasks without task-specific modifications.
In the new paper Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, a Google Brain research team presents Imagen, a text-to-image diffusion model that combines deep language understanding and photorealistic image generation capabilities to achieve a new state-of-the-art FID score of 7.27 on the COCO dataset.
In the new paper Tracing Knowledge in Language Models Back to the Training Data, a team from MIT CSAIL and Google Research proposes a benchmark for tracing language models’ assertions to the associated training data, aiming to establish a principled ground truth and mitigate high compute demands for large neural language model training.
In the new paper Large Language Models are Zero-Shot Reasoners, a research team from the University of Tokyo and Google Brain demonstrates that large language models (LLMs) can become good zero-shot reasoners through the addition of a simple prompt — “Let’s think step by step” — that elicits a step-by-step thinking process before each question is answered. Their Zero-shot-CoT model achieves huge performance gains compared to the zero-shot baseline.
In the new paper Automated Crossword Solving, researchers from UC Berkeley and Matthew Ginsberg LLC present the Berkeley Crossword Solver (BCS), an end-to-end state-of-the-art system for automatically solving challenging crossword puzzles that captured first place in the American Crossword Puzzle Tournament.
In the new paper Masked Autoencoders As Spatiotemporal Learners, a Meta AI research team extends masked autoencoders (MAE) to spatiotemporal representation learning for video. The novel approach introduces negligible inductive biases on space-time while achieving strong empirical results compared to vision transformers (ViTs) and outperforms supervised pretraining by large margins.
In the new paper Meta-Learning Sparse Compression Networks, a DeepMind research team proposes steps for scaling implicit neural representations (INRs). The resulting meta-learning sparse compression networks can represent diverse data modalities such as images, manifolds, signed distance functions, 3D shapes, and scenes, achieving state-of-the-art results on some of them.
In the new paper Rethinking Reinforcement Learning Based Logic Synthesis, a research team from Huawei Noah’s Ark Lab develops a novel reinforcement learning-based logic synthesis method to automatically recognize critical operators and produce common operator sequences that are generalizable to unseen circuits.