In the new paper Solving Math Word Problems With Process- and Outcome-based Feedback, a DeepMind research team conducts the first comprehensive comparison between process- and outcome-based model supervision. The two approaches achieve comparable final-answer error rate improvements on math word problems, while the process-based method significantly reduces reasoning errors from 14.0 to just 3.4 percent.
In the new paper I Can’t Believe There’s No Images! Learning Visual Tasks Using only Language Data, an Allen Institute for Artificial Intelligence team proposes Cross Modal Transfer On Semantic Embeddings (CLOSE), an approach that learns high-level skills from textual data, then uses these skills to complete vision tasks without additional visual training data.
In the NeurIPS 2022 Outstanding Paper Beyond Neural Scaling Laws: Beating Power Law Scaling via Data Pruning, a research team from Stanford University, University of Tübingen and Meta AI demonstrates in theory and practice how data pruning techniques can break beyond the power law scaling of error versus dataset size.
In the NeurIPS 2022 Outstanding Paper Gradient Descent: The Ultimate Optimizer, MIT CSAIL and Meta researchers present a novel technique that enables gradient descent optimizers such as SGD and Adam to tune their hyperparameters automatically. The method requires no manual differentiation and can be stacked recursively to many levels.
In the new paper SPACEx: Speech-driven Portrait Animation with Controllable Expression, an NVIDIA research team introduces SPACEx — a speech-driven portrait animation framework that generates high-resolution and expressive facial videos with control over subject pose, emotion and expression intensity.
In the new paper Fixing Model Bugs with Natural Language Patches, researchers from Stanford University and Microsoft Research propose a method that uses declarative statements as feedback for correcting errors in neural models, significantly increasing accuracy without high compute costs.
In the new paper Fast DistilBERT on CPUs, researchers from Intel Corporation and Intel Labs propose a pipeline and hardware-aware extreme compression technique for creating and running fast transformer models on CPUs. The approach achieves impressive speed ups and SOTA performance in production environments.
In the new paper Fine-Tuning Language Models via Epistemic Neural Networks, a DeepMind research team modifies large language models to create an Epistemic Neural Network. The novel approach achieves model performance comparable to that obtained via fine-tuning while requiring 50 percent less data.
In the new paper Transformers with Multiresolution Attention Heads (currently under double-blind review for ICLR 2023), researchers propose MrsFormer, a novel transformer architecture that uses Multiresolution-head Attention to approximate output sequences and significantly reduces head redundancy without sacrificing accuracy.
In the new paper VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors, researchers from the University of Texas at Austin and Sony AI present VIOLA (Visuomotor Imitation via Object-centric LeArning), an object-centric imitation learning model that endows imitation learning with awareness regarding objects and their interactions.
Colossal-AI releases a complete open-source Stable Diffusion pretraining and fine-tuning solution that reduces the pretraining cost by 6.5 times, and the hardware cost of fine-tuning by 7 times, while simultaneously speeding up the processes! The fine-tuning task flow can also be conveniently completed on an RTX 2070/3050 PC.
In the new paper Locating and Editing Factual Associations in GPT, a research team from MIT CSAIL, Northeastern University and Technion IIT examines how information flows during knowledge recall in large autoregressive transformers and introduces Rank-One Model Editing (ROME), a simple, zero-shot principled model editor capable of locating and editing factual associations in such models.
In the new paper Efficient AlphaFold2 Training using Parallel Evoformer and Branch Parallelism, a Baidu research team presents a Parallel Evoformer and Branch Parallelism approach for efficient AlphaFold2 training. The novel strategy improves AlphaFold2 training speed by up to 38.67 percent without sacrificing performance.
In the new paper Adversarial Policies Beat Professional-Level Go AIs, a research team from MIT, UC Berkeley, and FAR AI employs a novel adversarial policy to attack the state-of-the-art AI Go system KataGo. The team believes theirs is the first successful end-to-end attack against an AI Go system playing at the level of a human professional.
In the new paper When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels, a research team from Meta AI and Columbia University proposes JUICER, a framework that effectively utilizes binary and textual human feedback to improve the conversational responses of dialogue models.
In the new paper RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses, a Google Research team presents RankT5, which employs pretrained T5 models for text ranking with various ranking losses to directly optimize ranking performance. RankT5 models more natively support text ranking by outputting real numbers rather than text tokens.
In the new paper Towards Real-Time Text2Video via CLIP-Guided, Pixel-Level Optimization, researchers from Carnegie Mellon University leverage CLIP-guided, pixel-level optimization to generate 720p resolution videos from natural language descriptions at a rate of one-to-two frames per second — taking a big step towards a real-time text-to-video system.
In the new paper Can Language Models Learn From Explanations in Context?, DeepMind researchers investigate how different types of explanations, instructions, and controls affect language models’ zero- and few-shot performance and how such explanations can support in-context learning for large language models on challenging tasks.
In the new paper Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them, a Google Research and Stanford University team applies chain-of-thought (CoT) prompting — a series of intermediate reasoning steps — to 23 BIG-Bench tasks on which language models have failed to outperform the average human rater. The proposed approach enables models to surpass human performance on 17 of the 23 tasks.
In the new paper Wide Attention Is The Way Forward For Transformers, a research team from the University of Cambridge, Imperial College London, and the University of Oxford challenges the commonly held belief that deeper is better for transformer architectures, demonstrating that wider layers result in superior performance on natural language processing tasks.
Colossal-AI has successfully used a heterogeneous training strategy to increase the number of NLP model training parameters capacity by hundreds of times at the same hardware. And experiment results show that it only needs to keep 1~5% of the embedding parameters in the GPU, and is still able to maintain excellent end-to-end training speed.
In the new paper Foundation Transformers, a Microsoft team proposes a method for true general-purpose modelling. Their Foundation Transformer is a single unified transformer that provides guaranteed training stability and can handle diverse tasks and modalities without performance degradation.
In the new paper On Distillation of Guided Diffusion Models, researchers from Google Brain and Stanford University propose a novel approach for distilling classifier-free guided diffusion models with high sampling efficiency. The resulting models achieve performance comparable to the original model but with sampling steps reduced by up to 256 times.
In the new paper Ask Me Anything: A Simple Strategy for Prompting Language Models, a research team from Stanford University, Numbers Station, and the University of Wisconsin-Madison presents Ask Me Anything Prompting (AMA), a simple large language model prompting strategy that enables a 30x smaller language model to outperform few-shot GPT3-175B.
In the new paper Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods, DeepMind and NYU Center for Neural Systems researchers introduce computational efficiency evaluation approaches designed to aid in the selection of optimal methods, datasets and models for pretraining visual tasks on a fixed FLOP budget.
In the new paper TVLT: Textless Vision-Language Transformer, researchers from UNC Chapel Hill present the Textless Vision-Language Transformer (TVLT) for vision-and-language representation learning. TVLT uses only raw visual and audio inputs and performs comparably to its text-based counterparts but requires only 1/3 the parameters and achieves 28x faster inference speeds.
In the new paper Why Neural Networks Find Simple Solutions: The Many Regularizers of Geometric Complexity, a research team from Google and DeepMind proposes Geometric Complexity (GC), a measure of deep neural network model complexity that serves as a useful tool for understanding the underlying mechanisms of complexity control.
In the new paper Progressive Distillation for Dense Retrieval, a research team from Xiamen U and Microsoft Research presents PROD, a progressive distillation method for dense retrieval that achieves state-of-the-art performance on five widely used benchmarks.
In the new paper A Generalist Neural Algorithmic Learner, a research team from DeepMind, University of Oxford, IDSIA, Mila, and Purdue University presents a novel generalist neural algorithmic learner — a single graph neural network (GNN) capable of solving various classical algorithms at single-task expert level.
In the new paper Promptagator: Few-shot Dense Retrieval From 8 Examples, a Google Research team proposes Prompt-based Query Generation for Retriever (Promptagator), a novel and simple approach for few-shot retrieval that leverages large language model (LLM) prompting to generate synthetic task-specific training data.
In the new paper EcoFormer: Energy-Saving Attention with Linear Complexity, a Monash University research team presents EcoFormer, an attention mechanism with linear complexity that replaces expensive multiply-accumulate operations with simple accumulations and achieves a 73 percent energy footprint reduction on ImageNet.
In the new paper Human-level Atari 200x Faster, a DeepMind research team applies a set of diverse strategies to Agent57, with their resulting MEME (Efficient Memory-based Exploration) agent surpassing the human baseline on all 57 Atari games in just 390 million frames — two orders of magnitude faster than Agent57.