In a new paper DiLoCo: Distributed Low-Communication Training of Language Models, a Google DeepMind research team presents Distributed Low-Communication (DiLoCo). DiLoCo employs a distributed optimization algorithm that facilitates the training of language models on islands of poorly connected devices, surpassing the performance of fully synchronous models while reducing communication by 500 times.
In a new paper An Embodied Generalist Agent in 3D World, a research team introduces LEO, which stands as an embodied multi-modal and multi-task generalist agent that excels in essential capabilities such as perception, grounding, reasoning, planning, and action within the intricate 3D world.
In a new paper Exponentially Faster Language Modelling, an ETH Zurich research team introduces UltraFastBERT, a variant of the BERT architecture. UltraFastBERT takes a revolutionary approach by replacing feedforward layers with fast feedforward networks, resulting in an impressive 78x speedup over the optimized baseline feedforward implementation.
An Apple research team presents Large LAnguage model Reinforcement Learning Policy (LLaRP). LLaRP effectively repurposes LLMs for Reinforcement Learning (RL) challenges within the realm of Embodied Artificial Intelligence (AI), achieving a remarkable 1.7 times higher success rate compared to other established baselines and zero-shot LLM applications.
In a new paper Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time, a research team presents DEJAVU, a system that employs a cost-effective algorithm to predict contextual sparsity dynamically for each layer, combined with an asynchronous and hardware-aware implementation to accelerate LLM inference.
A research team from Institute of Science and Technology Austria (ISTA) and Neural Magic Inc. introduces the QMoE framework. This innovative framework offers an effective solution for accurately compressing massive MoEs and conducting swift compressed inference, reducing model sizes by 10–20×, achieving less than 1 bit per parameter.
In a new paper SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF, an NVIDIA research team introduces STEERLM, a novel supervised fine-tuning method that empowers end-users to control model responses during inference, surpassing even state-of-the-art baselines, including RLHF models like ChatGPT-3.5.
A Microsoft research team conducts an in-depth analysis of the latest model, GPT-4V(ision). Their report delves into the emerging application scenarios and outlines future research directions for GPT-4V-based systems, with the goal of inspiring research on next-generation multimodal task formulation and the development of more robust LLMs.
In a recent paper, “CodePlan: Repository-level Coding using LLMs and Planning,” a team from Microsoft Research introduces CodePlan—a versatile framework designed to address the complexities of repository-level coding tasks, encompassing extensive code changes across large, interconnected codebases.
In a new paper Effective Long-Context Scaling of Foundation Models, a Meta AI research team presents a series of long-context LLMs, built through the pretraining from LLAMA 2. These models support effective context windows of up to 32,768 tokens and outperform all existing open-sourced models in terms of performance.
In a new paper titled “The Reversal Curse: LLMs trained on ‘A is B’ fail to learn ‘B is A'” authored by a collaborative research team from Vanderbilt University, the UK Frontier AI Taskforce, Apollo Research, New York University, the University of Sussex, and the University of Oxford, has unveiled a remarkable shortcoming in auto-regressive large language models (LLMs).
In a new paper Neurons in Large Language Models: Dead, N-gram, Positional, a research team from Meta AI and Universitat Politècnica de Catalunya conducts comprehensive analysis of a family of Open Pre-trained Transformer Language Models (OPT) up to 66b parameters to provide insights of how feed-forward network (FFN) layers act.
A Microsoft research team introduce phi-1.5, a 1.3 billion parameter model trained on a vast dataset of 30 billion tokens, remarkably delivering performance that rivals models five times its size. Moreover, it outperforms most non-frontier LLMs in tackling intricate reasoning tasks.
In a new paper Large Language Models as Optimizers, a Google DeepMind research team introduces Optimization by PROmpting (OPRO), an effective method that leverages large language models (LLMs) as optimizers, which can generate optimization solutions conditioned on the natural language that describes the optimization task.
In a new paper MEMORY-VQ: Compression for Tractable Internet-Scale Memory, a Google research team introduces MEMORY-VQ, a novel method that significantly reduce storage requirements for memory-based methods while maintaining high performance, achieving 16x compression rate on the KILT benchmark.
In a new paper AskIt: Unified Programming Interface for Programming with Large Language Models, a MIT CSAIL research team presents AskIt, a domain-specific language (DSL) tailored for LLMs to accommodate a wide variety of tasks, which substantially reducing practitioners’ developmental overhead and effort for software.
In a new paper Prompt2Model: Generating Deployable Models from Natural Language Instructions, a research team from Carnegie Mellon University and Tsinghua University introduces Prompt2Model, a general-purpose approach that is able to use prompting technique to specify system behavior while resulting in a deployable special purpose model that enjoys all the advantages thereof.
In a new paper Platypus: Quick, Cheap, and Powerful Refinement of LLMs, a Boston University research team presents Platpus, a family of fine-tuned and merged Large Language Models (LLMs) that achieves the first place in HuggingFace’s Open LLM Leaderboard by performing quick, cheap and powerful refinement of conventional LLMs.
In a new paper ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs, a research team from Tsinghua University, ModelBest Inc., Renmin University of China, Yale University, Tencent Inc. and Zhihu Inc. presents ToolLLM, a general tool-use framework that demonstrates a compelling capability to master 16464 real-world RESTful APIs
In a new paper Llama 2: Open Foundation and Fine-Tuned Chat Model, a Meta AI research team presents and releases Llama 2 and Llama 2-Chat, the former one is a family of pretrained and fine-tuned LLMs and the later one is a fine-tuned version of Llama 2 that is optimized for dialogue, paving the way to develop more responsible LLMs.
Colossal-AI—the world’s largest and most active big model development tool and community—utilizes the current most widely used large model, LLaMA, to provide an example of the tool’s groundbreaking pre-training solutions for the 65 billion parameter large model which improves the training speed by 38%.
In a new paper Personality Traits in Large Language Models, a research team from Google, Cambridge University and Keio University proposes principled, validated methods to construct validity of characterizing personalities in LLM, simulates population variance in LLM responses and develops a personality shaping mechanism to control LLM personality traits.
In a new paper Language to Rewards for Robotic Skill Synthesis, a Google DeepMind research team proposes a new paradigm to leverage reward functions to interface language and low-level robot actions, which enables non-technical users to steer novel and intricate robot actions without large amount of data or expert knowledge to engineer low-level primitives.
In a new paper FinGPT: Open-Source Financial Large Language Models, a research team from Columbia University and New York University (Shanghai) presents FinGPT, an end-to-end open-source financial large language models (FinLLMs) that democratize financial data to encourage researchers and practitioners to developer user-specified FinLLMs.
In a new paper CodeTF: One-stop Transformer Library for State-of-the-art Code LLM, a Salesforce AI research team develop CodeTF, an open-source one-stop comprehensive Python library that provides a seamless interface for training and inferencing on code intelligence tasks, aiming to facilitate easy integration of state-of-the-art language models into real-world applications.
In the new paper DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining, a research team from Google and Stanford University introduces Domain Reweighting with Minimax Optimization (DoReMi), a domain weight optimization strategy that leverages distributionally robust optimization (DRO) to substantially speed up effective language model pretraining.
In the new paper READ: Recurrent Adaptation of Large Transformers, a Meta AI research team proposes REcurrent ADaption (READ), a lightweight and memory-efficient fine-tuning approach that achieves a 56 percent reduction in memory consumption and an 84 percent reduction in GPU use.
In the new paper ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities, a research team from Alibaba Group’s DAMO Academy and the Huazhong University of Science and Technology releases ONE-PEACE, a highly extensible model that can align and integrate representations across vision, audio, and language modalities; opening a path toward the creation of a general representation model for unlimited modalities.
In the new paper CodeT5+: Open Code Large Language Models for Code Understanding and Generation, a Salesforce AI Research team presents CodeT5+, a novel family of encoder-decoder code foundation large language models that can be flexibly adapted to a wide range of code understanding and generation tasks and outperform various code-related benchmarks.
In the new paper StarCoder: May the Source Be With You!, the BigCode community releases StarCoder and StarCoderBase, 15.5B parameter open-access large language models (LLMs) trained on 80+ programming languages. StarCoderBase outperforms all multi-programming-language code LLMs, and StarCoder surpasses all models fine-tuned on Python.
In the new paper VideoChat: Chat-Centric Video Understanding, a research team from Shanghai AI Laboratory, Nanjing University, the University of Hong Kong, and the Chinese Academy of Sciences presents VideoChat, a groundbreaking end-to-end chat-centric video understanding system that leverages state-of-the-art video and language models to improve spatiotemporal reasoning, event localization, and causal relationship inference.
In the new paper Automatic Prompt Optimization with “Gradient Descent” and Beam Search, a Microsoft research team presents Automatic Prompt Optimization, a simple and general prompt optimization algorithm that automatically improves prompts for large language models, significantly reducing the time and energy spent on manual prompting approaches.
In the new paper ResiDual: Transformer With Dual Residual Connections, a team from Microsoft Research, Microsoft Azure Translation, and Renmin University of China proposes ResiDual, a novel transformer architecture that fuses the connections in post-layer normalization and pre-layer normalization to exploit the benefits of both while also addressing their limitations.
In the new paper Inference with Reference: Lossless Acceleration of Large Language Models, a Microsoft research team proposes LLMA, an inference-with-reference decoding mechanism that achieves up to 2x lossless speed-ups with identical generation results by exploiting the overlaps between LLM outputs and references.
In the new paper BloombergGPT: A Large Language Model for Finance, a research team from Bloomberg and Johns Hopkins University presents BloombergGPT, a 50 billion parameter language model trained on a 700 billion token dataset that significantly outperforms current benchmark models on financial tasks.
In the new paper Sparks of Artificial General Intelligence: Early Experiments with GPT-4, a Microsoft Research team investigates GPT-4, demonstrating its ability to achieve human-level performance on novel and difficult tasks in domains ranging from mathematics and coding to vision, medicine, law and psychology; and proposing it as an early version of an AGI system.