In the new paper TVLT: Textless Vision-Language Transformer, researchers from UNC Chapel Hill present the Textless Vision-Language Transformer (TVLT) for vision-and-language representation learning. TVLT uses only raw visual and audio inputs and performs comparably to its text-based counterparts but requires only 1/3 the parameters and achieves 28x faster inference speeds.
In the new paper Why Neural Networks Find Simple Solutions: The Many Regularizers of Geometric Complexity, a research team from Google and DeepMind proposes Geometric Complexity (GC), a measure of deep neural network model complexity that serves as a useful tool for understanding the underlying mechanisms of complexity control.
In the new paper Progressive Distillation for Dense Retrieval, a research team from Xiamen U and Microsoft Research presents PROD, a progressive distillation method for dense retrieval that achieves state-of-the-art performance on five widely used benchmarks.
In the new paper A Generalist Neural Algorithmic Learner, a research team from DeepMind, University of Oxford, IDSIA, Mila, and Purdue University presents a novel generalist neural algorithmic learner — a single graph neural network (GNN) capable of solving various classical algorithms at single-task expert level.
In the new paper Promptagator: Few-shot Dense Retrieval From 8 Examples, a Google Research team proposes Prompt-based Query Generation for Retriever (Promptagator), a novel and simple approach for few-shot retrieval that leverages large language model (LLM) prompting to generate synthetic task-specific training data.
In the new paper EcoFormer: Energy-Saving Attention with Linear Complexity, a Monash University research team presents EcoFormer, an attention mechanism with linear complexity that replaces expensive multiply-accumulate operations with simple accumulations and achieves a 73 percent energy footprint reduction on ImageNet.
In the new paper Human-level Atari 200x Faster, a DeepMind research team applies a set of diverse strategies to Agent57, with their resulting MEME (Efficient Memory-based Exploration) agent surpassing the human baseline on all 57 Atari games in just 390 million frames — two orders of magnitude faster than Agent57.
In the new paper Vec2text With Round-Trip Translations, Google Brain researchers explore large language models’ capabilities for generating arbitrary natural language text from inputs of fixed-size vectors — a vec2text setting — and propose a simple data augmentation approach based on round-trip translations to improve vec2text model performance.
The new DeepMind paper Data Augmentation for Efficient Learning from Parametric Experts proposes Augmented Policy Cloning (APC), a simple yet effective data-augmentation approach designed to support data-efficient learning from parametric experts. The method significantly improves data efficiency across various control and reinforcement learning settings.
In the new paper Knowledge Neurons in Pretrained Transformers, a research team from Peking University and Microsoft Research introduces a knowledge attribution method that identifies the neurons that store factual knowledge in pretrained transformers and leverages these neurons to edit factual knowledge in transformers without any fine-tuning.
In the new paper MO2: Model-Based Offline Options, a DeepMind research team introduces Model-Based Offline Options (MO2), an offline hindsight bottleneck options framework that supports sample-efficient option discovery over continuous state-action spaces for efficient skill transfer to new tasks.
A research team from Microsoft and Harvard University demonstrates that neural networks can discover succinct learning algorithms on their own in polynomial time and presents an architecture that combines recurrent weight-sharing between layers and convolutional weight-sharing to reduce parameter size from even trillions of nodes down to a constant.
In the new paper Decoding Speech From Non-Invasive Brain Recordings, a research team from Meta AI and the Inria Saclay Centre presents a single end-to-end architecture for decoding natural speech processing from non-invasive magnetoencephalography (MEG) or electroencephalography (EEG) brain recordings that can detect macroscopic brain signals in real-time.
In the new paper Faithful Reasoning Using Large Language Models, a DeepMind research team proposes a forward-chaining selection-inference model that performs faithful reasoning and provides a valid reasoning trace to improve reasoning quality and help users validate the model’s final answers.
In the new paper PEER: A Collaborative Language Model, a research team from Meta AI, Carnegie Mellon University, PSL University, and University College London presents PEER, a collaborative language model that performs a humanlike writing process — composing drafts, adding suggestions, proposing edits and providing explanations for its actions.
In the new paper 3D-FM GAN: Towards 3D-Controllable Face Manipulation, a team from Princeton University and Adobe Research presents 3D-FM GAN, a novel conditional GAN framework that enables precise 3D-controllable face manipulation with high photorealism and strong identity preservation without requiring any manual tuning or optimizations.
In the new paper Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks, a Microsoft research team presents BEiT-3, a general-purpose state-of-the-art multimodal foundation model for both vision and vision-language tasks that advances the big convergence of backbone architectures, pretraining tasks, and model scaling.
Carnegie Mellon University researchers provide background information and details on contributions to the DialPort project over the last six years in their new paper The DialPort Tools. These tools — such as the DialPort Portal and DialCrowd — will be demoed at the SIGDIAL 2022 conference next month in Edinburgh.
In the new paper Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization, a research team from Microsoft Azure AI and Microsoft Research presents Z-Code++, a novel encoder-decoder pretrained language model optimized for abstractive summarization that significantly improves performance on low-resource summarization tasks.
In the new paper Paint2Pix: Interactive Painting based Progressive Image Synthesis and Editing, a research team from Adobe Research and Australian National University presents paint2pix, a novel model that learns to predict users’ intentions and produce photorealistic images from primitive and coarse human brushstroke inputs.
Colossal-AI team and BioMap open-source their latest solution – xTrimo Multimer for protein monomer and multimer structure prediction. This new solution can predict both monomer and multimer structure simultaneously accelerating the process by up to 11 times!
Google Research and Carnegie Mellon University have open-sourced a library for constructing Python program graph representations used in machine learning for code research. Details are presented in the report A Library for Representing Python Programs as Graphs for Machine Learning.
In the new paper Interactive Code Generation via Test-Driven User-Intent Formalization, a team from Microsoft Research, the University of Pennsylvania, and the University of California, San Diego proposes a workflow for test-driven user-intent formalization that leverages user feedback to generate code that is 90.40 percent consistent with user intent.
In the new paper Semi-supervised Vision Transformers at Scale, a research team from AWS AI Labs proposes a semi-supervised learning pipeline for vision transformers that is stable, reduces hyperparameter tuning sensitivity, and outperforms conventional convolutional neural networks.
In the new paper Learning to Improve Code Efficiency, a research team from the Georgia Institute of Technology and Google Research presents a novel discrete generative latent-variable model designed to help programmers identify more computationally efficient code variants, taking a step toward automating the process of code performance optimization.
In the new paper Few-shot Learning With Retrieval Augmented Language Models, a research team from Meta AI, PSL University, Inria, and University College London presents Atlas, a pretrained retrieval augmented language model that effectively learns new knowledge-intensive tasks under few-shot settings. Atlas outperforms the 540B parameter PaLM model on QA tasks while using 50x fewer parameters.
In the new paper BlenderBot 3: A Deployed Conversational Agent That Continually Learns to Responsibly Engage, researchers from Meta AI and Mila/McGill University release BlenderBot 3, a 175B parameter state-of-the-art open-domain dialogue model deployed on a public website. BlenderBot 3 is designed for continual learning via its user interactions.
A Tencent AI Lab research team introduces Efficient and Intelligent Editing (Effidit), a digital writing assistant that leverages large-scale neural language models to provide high-quality assistance in text completion, error checking, text polishing, keywords to sentences (K2S) and cloud input methods (cloud IME).
In the new paper MinVIS: A Minimal Video Instance Segmentation Framework Without Video-based Training, an NVIDIA research team presents MinVIS, a minimal video instance segmentation framework that outperforms state-of-the-art VIS approaches without requiring video-based training.
In the new paper TextWorldExpress: Simulating Text Games at One Million Steps Per Second, a research team from the University of Arizona and Microsoft Research Montréal presents TextWorldExpress, a high-performance text-game simulator that boosts throughput by approximately three orders of magnitude, reaching one million steps per second.
In the new paper Efficient Training of Language Models to Fill in the Middle, an OpenAI research team shows that causal decoder-based autoregressive (AR) language models can learn to infill texts via a very simple and straightforward transformation to the training data and without any architectural modifications.
In the new paper Is Attention All NeRF Needs?, a research team from the Indian Institute of Technology Madras and the University of Texas at Austin proposes Generalizable NeRF Transformer (GNT), a pure and universal transformer-based architecture for efficient on-the-fly reconstruction of NeRFs. The work demonstrates that a pure attention mechanism suffices for learning a physically-grounded rendering process.
In the new paper Neural Generation Meets Real People: Building a Social, Informative Open-Domain Dialogue Agent, a Stanford NLP research team presents Chirpy Cardinal, an open-domain conversational social chatbot with emotional and social intelligence that enables authentic and engaging interactions with real people.
In the new paper Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?, a research team from Google and DeepMind posits that understanding the connections between neural network architectures and scaling laws is essential for designing and evaluating new models. The team pretrains and finetunes over 100 models to reveal useful insights on the scaling behaviours of ten diverse model architectures.