In the new paper Transformers with Multiresolution Attention Heads (currently under double-blind review for ICLR 2023), researchers propose MrsFormer, a novel transformer architecture that uses Multiresolution-head Attention to approximate output sequences and significantly reduces head redundancy without sacrificing accuracy.
In the new paper VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors, researchers from the University of Texas at Austin and Sony AI present VIOLA (Visuomotor Imitation via Object-centric LeArning), an object-centric imitation learning model that endows imitation learning with awareness regarding objects and their interactions.
Colossal-AI releases a complete open-source Stable Diffusion pretraining and fine-tuning solution that reduces the pretraining cost by 6.5 times, and the hardware cost of fine-tuning by 7 times, while simultaneously speeding up the processes! The fine-tuning task flow can also be conveniently completed on an RTX 2070/3050 PC.
In the new paper Locating and Editing Factual Associations in GPT, a research team from MIT CSAIL, Northeastern University and Technion IIT examines how information flows during knowledge recall in large autoregressive transformers and introduces Rank-One Model Editing (ROME), a simple, zero-shot principled model editor capable of locating and editing factual associations in such models.
In the new paper Efficient AlphaFold2 Training using Parallel Evoformer and Branch Parallelism, a Baidu research team presents a Parallel Evoformer and Branch Parallelism approach for efficient AlphaFold2 training. The novel strategy improves AlphaFold2 training speed by up to 38.67 percent without sacrificing performance.
In the new paper Adversarial Policies Beat Professional-Level Go AIs, a research team from MIT, UC Berkeley, and FAR AI employs a novel adversarial policy to attack the state-of-the-art AI Go system KataGo. The team believes theirs is the first successful end-to-end attack against an AI Go system playing at the level of a human professional.
In the new paper When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels, a research team from Meta AI and Columbia University proposes JUICER, a framework that effectively utilizes binary and textual human feedback to improve the conversational responses of dialogue models.
In the new paper RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses, a Google Research team presents RankT5, which employs pretrained T5 models for text ranking with various ranking losses to directly optimize ranking performance. RankT5 models more natively support text ranking by outputting real numbers rather than text tokens.
In the new paper Towards Real-Time Text2Video via CLIP-Guided, Pixel-Level Optimization, researchers from Carnegie Mellon University leverage CLIP-guided, pixel-level optimization to generate 720p resolution videos from natural language descriptions at a rate of one-to-two frames per second — taking a big step towards a real-time text-to-video system.
In the new paper Can Language Models Learn From Explanations in Context?, DeepMind researchers investigate how different types of explanations, instructions, and controls affect language models’ zero- and few-shot performance and how such explanations can support in-context learning for large language models on challenging tasks.
In the new paper Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them, a Google Research and Stanford University team applies chain-of-thought (CoT) prompting — a series of intermediate reasoning steps — to 23 BIG-Bench tasks on which language models have failed to outperform the average human rater. The proposed approach enables models to surpass human performance on 17 of the 23 tasks.
In the new paper Wide Attention Is The Way Forward For Transformers, a research team from the University of Cambridge, Imperial College London, and the University of Oxford challenges the commonly held belief that deeper is better for transformer architectures, demonstrating that wider layers result in superior performance on natural language processing tasks.
Colossal-AI has successfully used a heterogeneous training strategy to increase the number of NLP model training parameters capacity by hundreds of times at the same hardware. And experiment results show that it only needs to keep 1~5% of the embedding parameters in the GPU, and is still able to maintain excellent end-to-end training speed.
In the new paper Foundation Transformers, a Microsoft team proposes a method for true general-purpose modelling. Their Foundation Transformer is a single unified transformer that provides guaranteed training stability and can handle diverse tasks and modalities without performance degradation.
In the new paper On Distillation of Guided Diffusion Models, researchers from Google Brain and Stanford University propose a novel approach for distilling classifier-free guided diffusion models with high sampling efficiency. The resulting models achieve performance comparable to the original model but with sampling steps reduced by up to 256 times.
In the new paper Ask Me Anything: A Simple Strategy for Prompting Language Models, a research team from Stanford University, Numbers Station, and the University of Wisconsin-Madison presents Ask Me Anything Prompting (AMA), a simple large language model prompting strategy that enables a 30x smaller language model to outperform few-shot GPT3-175B.
In the new paper Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods, DeepMind and NYU Center for Neural Systems researchers introduce computational efficiency evaluation approaches designed to aid in the selection of optimal methods, datasets and models for pretraining visual tasks on a fixed FLOP budget.
In the new paper TVLT: Textless Vision-Language Transformer, researchers from UNC Chapel Hill present the Textless Vision-Language Transformer (TVLT) for vision-and-language representation learning. TVLT uses only raw visual and audio inputs and performs comparably to its text-based counterparts but requires only 1/3 the parameters and achieves 28x faster inference speeds.
In the new paper Why Neural Networks Find Simple Solutions: The Many Regularizers of Geometric Complexity, a research team from Google and DeepMind proposes Geometric Complexity (GC), a measure of deep neural network model complexity that serves as a useful tool for understanding the underlying mechanisms of complexity control.
In the new paper Progressive Distillation for Dense Retrieval, a research team from Xiamen U and Microsoft Research presents PROD, a progressive distillation method for dense retrieval that achieves state-of-the-art performance on five widely used benchmarks.
In the new paper A Generalist Neural Algorithmic Learner, a research team from DeepMind, University of Oxford, IDSIA, Mila, and Purdue University presents a novel generalist neural algorithmic learner — a single graph neural network (GNN) capable of solving various classical algorithms at single-task expert level.
In the new paper Promptagator: Few-shot Dense Retrieval From 8 Examples, a Google Research team proposes Prompt-based Query Generation for Retriever (Promptagator), a novel and simple approach for few-shot retrieval that leverages large language model (LLM) prompting to generate synthetic task-specific training data.
In the new paper EcoFormer: Energy-Saving Attention with Linear Complexity, a Monash University research team presents EcoFormer, an attention mechanism with linear complexity that replaces expensive multiply-accumulate operations with simple accumulations and achieves a 73 percent energy footprint reduction on ImageNet.
In the new paper Human-level Atari 200x Faster, a DeepMind research team applies a set of diverse strategies to Agent57, with their resulting MEME (Efficient Memory-based Exploration) agent surpassing the human baseline on all 57 Atari games in just 390 million frames — two orders of magnitude faster than Agent57.
In the new paper Vec2text With Round-Trip Translations, Google Brain researchers explore large language models’ capabilities for generating arbitrary natural language text from inputs of fixed-size vectors — a vec2text setting — and propose a simple data augmentation approach based on round-trip translations to improve vec2text model performance.
The new DeepMind paper Data Augmentation for Efficient Learning from Parametric Experts proposes Augmented Policy Cloning (APC), a simple yet effective data-augmentation approach designed to support data-efficient learning from parametric experts. The method significantly improves data efficiency across various control and reinforcement learning settings.
In the new paper Knowledge Neurons in Pretrained Transformers, a research team from Peking University and Microsoft Research introduces a knowledge attribution method that identifies the neurons that store factual knowledge in pretrained transformers and leverages these neurons to edit factual knowledge in transformers without any fine-tuning.
In the new paper MO2: Model-Based Offline Options, a DeepMind research team introduces Model-Based Offline Options (MO2), an offline hindsight bottleneck options framework that supports sample-efficient option discovery over continuous state-action spaces for efficient skill transfer to new tasks.
A research team from Microsoft and Harvard University demonstrates that neural networks can discover succinct learning algorithms on their own in polynomial time and presents an architecture that combines recurrent weight-sharing between layers and convolutional weight-sharing to reduce parameter size from even trillions of nodes down to a constant.
In the new paper Decoding Speech From Non-Invasive Brain Recordings, a research team from Meta AI and the Inria Saclay Centre presents a single end-to-end architecture for decoding natural speech processing from non-invasive magnetoencephalography (MEG) or electroencephalography (EEG) brain recordings that can detect macroscopic brain signals in real-time.
In the new paper Faithful Reasoning Using Large Language Models, a DeepMind research team proposes a forward-chaining selection-inference model that performs faithful reasoning and provides a valid reasoning trace to improve reasoning quality and help users validate the model’s final answers.