In a paper titled “Generative Image Dynamics,” a Google research team introduces an innovative approach to model natural oscillation dynamics using a single static image. This approach yields photo-realistic animations derived from a lone image, surpassing the performance of previous methods by a substantial margin.
In a new paper Neurons in Large Language Models: Dead, N-gram, Positional, a research team from Meta AI and Universitat Politècnica de Catalunya conducts comprehensive analysis of a family of Open Pre-trained Transformer Language Models (OPT) up to 66b parameters to provide insights of how feed-forward network (FFN) layers act.
In a new paper Agents: An Open-source Framework for Autonomous Language Agents, a research team from AIWaves Inc., Zhejiang University and ETH Zürich releases AGENTS, an open-source framework that enables non-specialists for developing and deploying state-of-the-art autonomous language agents with minimal coding work.
A Microsoft research team introduce phi-1.5, a 1.3 billion parameter model trained on a vast dataset of 30 billion tokens, remarkably delivering performance that rivals models five times its size. Moreover, it outperforms most non-frontier LLMs in tackling intricate reasoning tasks.
An Apple research team introduces the concept of sparse Mobile Vision MoEs (V-MoEs), which represents a streamlined and mobile-friendly Mixture-of-Experts architecture that efficiently downscales Vision Transformers (ViTs) while preserving impressive model performance.
In a new paper Large Language Models as Optimizers, a Google DeepMind research team introduces Optimization by PROmpting (OPRO), an effective method that leverages large language models (LLMs) as optimizers, which can generate optimization solutions conditioned on the natural language that describes the optimization task.
A collaborative research effort from Equall and Apple delves into the role of the FFN and uncovers a surprising revelation: despite consuming a significant portion of the model’s parameters, the FFN exhibits high redundancy. As a result, the researchers propose sharing a single FFN across both the encoder and decoder, thereby reducing the parameter count while causing only a modest drop in accuracy.
In a new paper MEMORY-VQ: Compression for Tractable Internet-Scale Memory, a Google research team introduces MEMORY-VQ, a novel method that significantly reduce storage requirements for memory-based methods while maintaining high performance, achieving 16x compression rate on the KILT benchmark.
In a new paper AskIt: Unified Programming Interface for Programming with Large Language Models, a MIT CSAIL research team presents AskIt, a domain-specific language (DSL) tailored for LLMs to accommodate a wide variety of tasks, which substantially reducing practitioners’ developmental overhead and effort for software.
Colossal-AI provides revolutionary LLaMA2 training efficiency for 8 to 512 GPUs, fine-tuning, and inference solutions. The 70 billion parameter training can be accelerated by 195%, and provides a fully-managed ML cloud platform solution, greatly reducing the cost of large model development and applications.
A Meta AI research team presents Neural Optical Understanding for Academic Documents (Nougat), a Visual Transformer model that can effectively convert scientific documents stored in PDF format to a lightweight markup language, even intensive mathematical equations are involved.
In a new paper Prompt2Model: Generating Deployable Models from Natural Language Instructions, a research team from Carnegie Mellon University and Tsinghua University introduces Prompt2Model, a general-purpose approach that is able to use prompting technique to specify system behavior while resulting in a deployable special purpose model that enjoys all the advantages thereof.
In a new paper Diversifying AI: Towards Creative Chess with AlphaZero, a Google DeepMind research team explores whether artificial intelligence can benefit from creative problem-solving mechanisms identified in human intelligence while pushing to the limits of its computational rationality.
In a new paper Composable Function-preserving Expansions for Transformer Architectures, a research team from Google DeepMind and University of Toulouse introduces parameter expansion transformations for transformer-based neural networks while preserving functionality, enabling the expansion of the capability of the model as needed.
In a new paper SpeechX: Neural Codec Language Model as a Versatile Speech Transformer, a Microsoft research team presents SpeechX, a versatile, robust, and extensible speech generation model that is capable to address zero-shot TTS and various speech transformation tasks, handling both clean and noisy signals.
In a new paper Platypus: Quick, Cheap, and Powerful Refinement of LLMs, a Boston University research team presents Platpus, a family of fine-tuned and merged Large Language Models (LLMs) that achieves the first place in HuggingFace’s Open LLM Leaderboard by performing quick, cheap and powerful refinement of conventional LLMs.
In a new paper Bayesian Flow Networks, the NNAISENSE research team presents Bayesian Flow Networks (BFNs), a novel family of generative model manipulates parameters of the data distribution rather than operating on noisy data, which provides an effective solution to deal with discrete data.
In a new paper Follow Anything: Open-set detection, tracking, and following in real-time, a research team from MIT and Harvard University presents the follow anything system (FAn), an open-set real-time any object following framework that can detect, segment, track, and follow any object, and is able to adapt to new objects using text, images, or click queries.
In a new paper Shepherd: A Critic for Language Model Generation, a Meta AI research team presents Shepherd, a language model that are explicitly tuned to critique model generated outputs as well as to generate feedbacks to suggest improvements on solving the factuality, logical errors, coherence, and alignment issues.
In a new paper JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models, a Futureverse research team presents JEN-1, a universal framework that combines bidirectional and unidirectional modes to generate high-quality music conditioned on either text or music representations.
In a new paper AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning, a DeepMind research team presents AlphaStar Unplugged, an unprecedented challenging large-scale offline reinforcement learning benchmark that leverages a offline dataset from StarCraft II for RL agents training.
In a new paper DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales, a Deepspeed of Microsoft research team presents DeepSpeed-Chat, a novel end-to-end RLHF pipeline that provides easy-to-use training and inference for ChatGPT-like models at scale.
In a new paper A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis, a research team from Google DeepMind and The University of Tokyo presents WebAgent, a LLMs-driven real-world web navigation agent that can address real websites tasks following natural language instructions.
In a new paper ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs, a research team from Tsinghua University, ModelBest Inc., Renmin University of China, Yale University, Tencent Inc. and Zhihu Inc. presents ToolLLM, a general tool-use framework that demonstrates a compelling capability to master 16464 real-world RESTful APIs
In a new paper Towards Generalist Biomedical AI, a research team from Google Research and Google DeepMind presents Med-PaLM Multimodal (Med-PaLM M), a large multimodal generative model that can process multi-modal biomedical data including clinical language, imaging, and genomics using a single set of model weights without any task-specific modification.
In a new paper Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning, a Stanford University research team affirms that simple language skills can emerge in meta-RL agents without direct language supervision by testifying this theory in their customized multi-task environment.
In a new paper FaceCLIPNeRF: Text-driven 3D Face Manipulation using Deformable Neural Radiance Fields, a research team from KAIST and Scatter Lab introduces FaceCLIPNeRF, a novel text-driven pipeline that enable high-quality face manipulation using deformable NeRF without extensive human labor.
In a new paper Brain2Music: Reconstructing Music from Human Brain Activity, a research team from Google, Osaka University, NICT and Araya Inc. introduces Brain2Music, an approach for reconstructing music from brain activity by MusicLM, aiming to gain insights of the relationships between brain activity and human cognitive and sentimental experiences.
In a new paper A Definition of Continual Reinforcement LearningA Definition of Continual Reinforcement Learning, a DeepMind research team rethinks RL problems as endless adaptation and provides a clean, general, precise mathematical definition of continual reinforcement learning (CRL), aiming to promote researches on CRL from a solid conceptual foundation.
In a new paper Objaverse-XL: A Universe of 10M+ 3D Objects, a research team from Allen Institute for AI, University of Washington, Columbia University, Stability AI, California Institute of Technology and LAION join force to present Objaverse-XL, a large-scale, web-crawled dataset of 3D assets, which provides substantially richer variety and quality data that aims to boost the performance of state-of-the-art 3D models.
In a new paper Llama 2: Open Foundation and Fine-Tuned Chat Model, a Meta AI research team presents and releases Llama 2 and Llama 2-Chat, the former one is a family of pretrained and fine-tuned LLMs and the later one is a fine-tuned version of Llama 2 that is optimized for dialogue, paving the way to develop more responsible LLMs.
Colossal-AI—the world’s largest and most active big model development tool and community—utilizes the current most widely used large model, LLaMA, to provide an example of the tool’s groundbreaking pre-training solutions for the 65 billion parameter large model which improves the training speed by 38%.