A research team innovative single-stage category-agnostic diffusion model. This model can generate 3D Neural Radiance Fields (NeRFs) from either text or a single-image input condition through direct model inference, enabling the creation of diverse high-fidelity 3D objects in just 30s/asset.
In a new paper DiLoCo: Distributed Low-Communication Training of Language Models, a Google DeepMind research team presents Distributed Low-Communication (DiLoCo). DiLoCo employs a distributed optimization algorithm that facilitates the training of language models on islands of poorly connected devices, surpassing the performance of fully synchronous models while reducing communication by 500 times.
In a new paper An Embodied Generalist Agent in 3D World, a research team introduces LEO, which stands as an embodied multi-modal and multi-task generalist agent that excels in essential capabilities such as perception, grounding, reasoning, planning, and action within the intricate 3D world.
In a new paper Exponentially Faster Language Modelling, an ETH Zurich research team introduces UltraFastBERT, a variant of the BERT architecture. UltraFastBERT takes a revolutionary approach by replacing feedforward layers with fast feedforward networks, resulting in an impressive 78x speedup over the optimized baseline feedforward implementation.
Microsoft has recently unveiled Orca 2 in a new paper titled “Orca 2: Teaching Small Language Models How to Reason.” to explore how enhanced training signals can augment the reasoning abilities of smaller language models. Notably, Orca 2 surpasses models of similar size, achieving performance levels comparable to or better than models 5-10 times larger.
In a new paper Data Filtering Networks, a research team from Apple and University of Washington introduces the concept of data filtering networks (DFNs). These neural networks, specifically designed for data filtration, demonstrate the capacity to generate extensive, high-quality pre-training datasets efficiently.
In a new paper LRM: Large Reconstruction Model for Single Image to 3D, a research team from Adobe Research and Australian National Univerisity introduces an innovative Large Reconstruction Model (LRM). This groundbreaking model has the remarkable ability to predict a 3D model of an object from a single input image in a mere 5 seconds.
In a new paper E3 TTS: Easy End-to-End Diffusion-based Text to Speech, a Google research team proposes Easy End-to-End Diffusion-based Text to Speech. This streamlined and efficient text-to-speech model hinges solely on diffusion to preserve temporal structure, allowing it to accept plain text as input and generate audio waveforms directly.
An Apple research team presents Large LAnguage model Reinforcement Learning Policy (LLaRP). LLaRP effectively repurposes LLMs for Reinforcement Learning (RL) challenges within the realm of Embodied Artificial Intelligence (AI), achieving a remarkable 1.7 times higher success rate compared to other established baselines and zero-shot LLM applications.
In a new paper Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time, a research team presents DEJAVU, a system that employs a cost-effective algorithm to predict contextual sparsity dynamically for each layer, combined with an asynchronous and hardware-aware implementation to accelerate LLM inference.
A research team from Institute of Science and Technology Austria (ISTA) and Neural Magic Inc. introduces the QMoE framework. This innovative framework offers an effective solution for accurately compressing massive MoEs and conducting swift compressed inference, reducing model sizes by 10–20×, achieving less than 1 bit per parameter.
In a new paper ConvNets Match Vision Transformers at Scale, a Google DeepMind research team challenges the prevailing belief that Vision Transformers possess superior scaling capabilities compared to ConvNets and provides empirical results revealing that ConvNets can indeed hold their own against Vision Transformers at scale.
In a new paper Techniques for Training Consistency Models, an OpenAI research team introduces innovative methods that enable consistency models to learn directly from data, surpassing the performance of consistency distillation (CD) in producing high-quality samples, all while breaking free from the clutches of LPIPS.
In a new paper Large Search Model: Redefining Search Stack in the Era of LLMs, a Microsoft research team presents a novel conceptual framework, large search model, which reimagines the conventional search stack by consolidating various search tasks under a single Large Language Model (LLM).
In a new paper Improving Image Generation with Better Captions, a research team from OpenAI and Microsoft introduces DALL-E 3, a cutting-edge text-to-image generation system that is benchmarked for its prowess in prompt following, coherence, and aesthetics, demonstrating its competitive edge against existing counterparts.
In a new paper SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF, an NVIDIA research team introduces STEERLM, a novel supervised fine-tuning method that empowers end-users to control model responses during inference, surpassing even state-of-the-art baselines, including RLHF models like ChatGPT-3.5.
A Microsoft Azure AI research team introduces “Idea2Img” in their paper, “Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation.”, which leverages the capabilities of GPT-4V(ision) to revolutionize the process of automatic image design and generation with enhanced image quality.
In a new paper MatFormer: Nested Transformer for Elastic Inference, a research team proposes MatFormer, a Transformer architecture that is inherently designed for elasticity, enables the training of a single universal model capable of generating numerous smaller submodels without the need for additional training.
In a new paper DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention, a research team from DeepSpeed of Microsoft presents the DeepSpeed-VisualChat framework, which is designed to optimize LLMs by incorporating multi-modal capabilities, demonstrating superior scalability, even up to a 70 billion parameter model size.
In a new paper HyperAttention: Long-context Attention in Near-Linear Time, a research team from Yale University and Google Research presents HyperAttention, an approximate attention mechanism not only offers practical efficiency but also delivers the best near-linear time guarantee for long contexts processing.
In a new paper Conditional Diffusion Distillation, a research team from Google Research and Johns Hopkins University introduces an innovative framework that distills an unconditional diffusion model into a conditional one, enabling image generation with significantly fewer steps.
In a new paper Decoding speech perception from non-invasive brain recordings, a research team from Meta AI, Inria Saclay and PSL University exhibits the remarkable capability to decode speech from brain signals recorded non-invasively through magneto-encephalography (MEG) or electro-encephalography (EEG).
DeepMind, in collaboration with 33 academic laboratories heralds the arrival of RT-1-X, a novel robotics transformer (RT) model that evolves from RT-1. RT-1-X is meticulously trained on the novel Open X-Embodiment dataset constructed by the researchers and showcases a remarkable 50% improvement in success rates compared to task-specified models.
A Microsoft research team conducts an in-depth analysis of the latest model, GPT-4V(ision). Their report delves into the emerging application scenarios and outlines future research directions for GPT-4V-based systems, with the goal of inspiring research on next-generation multimodal task formulation and the development of more robust LLMs.
A NNAISENSE research team introduces a novel class of generative models known as Bayesian Flow Networks (BFNs). These BFNs combine the power of Bayesian inference with neural networks in an iterative modeling process, enabling successful application to continuous, discretized, and discrete data while maintaining competitive performance.
In a new paper MAPTree: Beating “Optimal” Decision Trees with Bayesian Decision Trees, a Stanford University research team introduces MAPTree, an algorithm that confidently uncovers the maximum a posteriori tree within Bayesian Classification and Regression Trees (BCART) posterior, achieving strong performance with significantly leaner and faster trees.
In a recent paper, “CodePlan: Repository-level Coding using LLMs and Planning,” a team from Microsoft Research introduces CodePlan—a versatile framework designed to address the complexities of repository-level coding tasks, encompassing extensive code changes across large, interconnected codebases.
In a new paper Effective Long-Context Scaling of Foundation Models, a Meta AI research team presents a series of long-context LLMs, built through the pretraining from LLAMA 2. These models support effective context windows of up to 32,768 tokens and outperform all existing open-sourced models in terms of performance.
In a new paper titled “The Reversal Curse: LLMs trained on ‘A is B’ fail to learn ‘B is A'” authored by a collaborative research team from Vanderbilt University, the UK Frontier AI Taskforce, Apollo Research, New York University, the University of Sussex, and the University of Oxford, has unveiled a remarkable shortcoming in auto-regressive large language models (LLMs).
Being at the forefront of cost reduction and efficiency enhancement for large models, the Colossal-AI team maximizes the core capabilities of LLaMA-2. Through innovative training techniques, Colossal-AI has achieved remarkable results by utilizing only approximately 0.0085 trillion tokens of data, investing 15 hours, and incurring training costs in the range of a few hundred dollars.
In a paper titled “Generative Image Dynamics,” a Google research team introduces an innovative approach to model natural oscillation dynamics using a single static image. This approach yields photo-realistic animations derived from a lone image, surpassing the performance of previous methods by a substantial margin.
In a new paper Neurons in Large Language Models: Dead, N-gram, Positional, a research team from Meta AI and Universitat Politècnica de Catalunya conducts comprehensive analysis of a family of Open Pre-trained Transformer Language Models (OPT) up to 66b parameters to provide insights of how feed-forward network (FFN) layers act.
In a new paper Agents: An Open-source Framework for Autonomous Language Agents, a research team from AIWaves Inc., Zhejiang University and ETH Zürich releases AGENTS, an open-source framework that enables non-specialists for developing and deploying state-of-the-art autonomous language agents with minimal coding work.
A Microsoft research team introduce phi-1.5, a 1.3 billion parameter model trained on a vast dataset of 30 billion tokens, remarkably delivering performance that rivals models five times its size. Moreover, it outperforms most non-frontier LLMs in tackling intricate reasoning tasks.
In a new paper Large Language Models as Optimizers, a Google DeepMind research team introduces Optimization by PROmpting (OPRO), an effective method that leverages large language models (LLMs) as optimizers, which can generate optimization solutions conditioned on the natural language that describes the optimization task.
A collaborative research effort from Equall and Apple delves into the role of the FFN and uncovers a surprising revelation: despite consuming a significant portion of the model’s parameters, the FFN exhibits high redundancy. As a result, the researchers propose sharing a single FFN across both the encoder and decoder, thereby reducing the parameter count while causing only a modest drop in accuracy.