In a new paper Llama 2: Open Foundation and Fine-Tuned Chat Model, a Meta AI research team presents and releases Llama 2 and Llama 2-Chat, the former one is a family of pretrained and fine-tuned LLMs and the later one is a fine-tuned version of Llama 2 that is optimized for dialogue, paving the way to develop more responsible LLMs.
In a new paper AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning, a research team presents AnimateDiff, a general and practical framework that is able to generate animated images for any personalized text-to-image (T2I) model, without any extra training and model-specified tuning.
In a new paper General Part Assembly Planning, a research team from Columbia University and Google DeepMind introduces General Part Assembly Transformer (GPAT), a transformer-based model for assembly planning that has strong generalization capability to automatically estimate novel and diverse target and part shapes.
In a new paper SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs, a research team from Google Research and Carnegie Mellon University introduces Semantic Pyramid AutoEncoder (SPACE), the first successful method for enabling frozen LLMs to solve cross-modal tasks.
In a new paper LongNet: Scaling Transformers to 1,000,000,000 Tokens, a Microsoft research team presents LONGNET, a Transformer variant that successfully scaling sequence to more than 1 billion tokens while maintaining stronger performance and have a linear computation complexity.
In a new paper Personality Traits in Large Language Models, a research team from Google, Cambridge University and Keio University proposes principled, validated methods to construct validity of characterizing personalities in LLM, simulates population variance in LLM responses and develops a personality shaping mechanism to control LLM personality traits.
In a new paper Let’s Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning, a Google research team proposes THOUGHT EXPERIMENTS, a new prompting framework that instructs language models to perform better moral reasoning using counterfactuals, boosting Moral Scenarios task accuracy by 9-16%.
In a new paper Automatic Calibration and Error Correction for Large Language Models via Pareto Optimal Self-Supervision, a Microsoft team research team presents Pareto optimal self-supervision, a flexible framework that leverages programmatic supervision to automatically calibrate and correct error for Large language models without extra manual efforts.
In a new paper Language to Rewards for Robotic Skill Synthesis, a Google DeepMind research team proposes a new paradigm to leverage reward functions to interface language and low-level robot actions, which enables non-technical users to steer novel and intricate robot actions without large amount of data or expert knowledge to engineer low-level primitives.
In a new paper Fast Segment Anything, a research team from Chinese Academy of Sciences, University of Chinese Academy of Sciences, Objecteye Inc. and Wuhan AI Research presents FastSAM, a real-time solution for the segment anything task that achieves comparable performance to SAM while drastically reducing computational demands.
In a new paper Scaling Open-Vocabulary Object Detection, a DeepMind research team introduces OWLv2 model, an optimized architecture with improved training efficiency and applies and OWL-ST self-training recipe to the proposed OWLv2 to substantially improves detection performance, achieving state-of-the-art result on open-vocabulary detection task.
In a new paper Infinite Photorealistic Worlds using Procedural Generation, a Princeton University research team presents Infinigen, a procedural photorealistic 3D scenes generator that is capable to generate unlimited, diverse training data of the natural world, substantially expands the coverage of existing synthetic data.
In a new paper High-Fidelity Audio Compression with Improved RVQGAN, a Descript research team presents Improved RVQGAN, a high fidelity universal audio compression model that combines advances in high-fidelity audio generation and improved adversarial and reconstruction losses to achieve 90x compression of 44.1 KHz audio at only 8kbps bandwidth.
The BAAI 2023 Conference in Beijing successfully closed on June 10. With two busy days of agenda, the host Beijing Academic of Artificial Intelligence (BAAI) welcomed numerous renowned AI scholars, seasoned industry leaders, and enthusiastic AI researchers to share their insights on the latest AI hot topics.
In a new paper Prodigy: An Expeditiously Adaptive Parameter-Free Learner, a research team from Samsung AI Center and Meta AI presents two novel modifications, Prodigy and Resetting, to enhance the D-Adaptation method’s worst-case non-asymptotic convergence rate, achieving faster convergence rates and better optimization outputs.
In a new paper Image Captioners Are Scalable Vision Learners Too, a DeepMind research team presents CapPa, a image captioning based pretraining strategy that and can compete CLIP and exhibit favorable model and data scaling properties, verifying that a plain image captioning can be a competitive pretraining strategy for vision backbones.
In a new paper FinGPT: Open-Source Financial Large Language Models, a research team from Columbia University and New York University (Shanghai) presents FinGPT, an end-to-end open-source financial large language models (FinLLMs) that democratize financial data to encourage researchers and practitioners to developer user-specified FinLLMs.
In a new paper From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces, a research team from Google and DeepMind proposes PIX2ACT, a Transformer-based image-to-text model that is able to generate outputs corresponding to mouse and keyboard actions based solely on pixel-based screenshots from Graphical User Interfaces (GUIs).
In a new paper CodeTF: One-stop Transformer Library for State-of-the-art Code LLM, a Salesforce AI research team develop CodeTF, an open-source one-stop comprehensive Python library that provides a seamless interface for training and inferencing on code intelligence tasks, aiming to facilitate easy integration of state-of-the-art language models into real-world applications.
In a new paper Faster sorting algorithms discovered using deep reinforcement learning, a DeepMind research team introduces AlphaDev, a deep reinforcement learning agent which is capable to automatically discover correct and efficient sorting algorithms that achieves superior performance then previously known human benchmarks.
In a new paper Orca: Progressive Learning from Complex Explanation Traces of GPT-4, a Microsoft research team introduces Orca, a 13-billion parameter model that learns explanation traces; step-by-step thought processes; and complex instructions from GPT-4 to significantly boosts SOTA instruction-tuned models.
In a new paper LLaVA-Med: Training a Large Language-and-Vision Assistant, a Microsoft research team proposes a Large Language and Vision Assistant for BioMedicine (LLaVA-Med), which can be trained in less than 15 hours and demonstrates strong multimodal conversational capability, aiding inquiries about biomedical image.
In a new paper Bigger, Better, Faster: Human-level Atari with human-level efficiency, a research team from Google DeepMind, Mila and Universite de Montreal presents a value-based RL agent, which they call faster, better, faster (BBF), that achieves super-human performance on the Atari 100K benchmark on single GPU.
In a new paper How Does Generative Retrieval Scale to Millions of Passages? a research team from Google Research and University of Waterloo performs the first empirical study of generative retrieval across various corpus scales, even scaling up to the entire MS MARCO passage ranking task that contains 8.8M passages, aiming to provide insights on scaling generative retrieval to millions of passages.
In the new paper DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining, a research team from Google and Stanford University introduces Domain Reweighting with Minimax Optimization (DoReMi), a domain weight optimization strategy that leverages distributionally robust optimization (DRO) to substantially speed up effective language model pretraining.
In the new paper Large Language Models as Tool Makers, a research team from Google DeepMind, Princeton University and Stanford University presents LATM (large language models as tool makers), a closed-loop framework that enables LLMs to create their own reusable tools to boost efficiency and enhance their problem-solving capabilities.
In the new paper READ: Recurrent Adaptation of Large Transformers, a Meta AI research team proposes REcurrent ADaption (READ), a lightweight and memory-efficient fine-tuning approach that achieves a 56 percent reduction in memory consumption and an 84 percent reduction in GPU use.
In the new paper ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities, a research team from Alibaba Group’s DAMO Academy and the Huazhong University of Science and Technology releases ONE-PEACE, a highly extensible model that can align and integrate representations across vision, audio, and language modalities; opening a path toward the creation of a general representation model for unlimited modalities.
In the new paper CodeT5+: Open Code Large Language Models for Code Understanding and Generation, a Salesforce AI Research team presents CodeT5+, a novel family of encoder-decoder code foundation large language models that can be flexibly adapted to a wide range of code understanding and generation tasks and outperform various code-related benchmarks.
In the new paper StarCoder: May the Source Be With You!, the BigCode community releases StarCoder and StarCoderBase, 15.5B parameter open-access large language models (LLMs) trained on 80+ programming languages. StarCoderBase outperforms all multi-programming-language code LLMs, and StarCoder surpasses all models fine-tuned on Python.
In the new paper VideoChat: Chat-Centric Video Understanding, a research team from Shanghai AI Laboratory, Nanjing University, the University of Hong Kong, and the Chinese Academy of Sciences presents VideoChat, a groundbreaking end-to-end chat-centric video understanding system that leverages state-of-the-art video and language models to improve spatiotemporal reasoning, event localization, and causal relationship inference.
In the new paper ZipIt! Merging Models from Different Tasks Without Training, a Georgia Tech research team proposes ZipIt!, a general method that exploits redundant features to combine two or more models with the same architecture but trained on different tasks into one multi-task model without additional training.
In the new paper Automatic Prompt Optimization with “Gradient Descent” and Beam Search, a Microsoft research team presents Automatic Prompt Optimization, a simple and general prompt optimization algorithm that automatically improves prompts for large language models, significantly reducing the time and energy spent on manual prompting approaches.
In the new paper Finding Neurons in a Haystack: Case Studies with Sparse Probing, a research team from MIT, Harvard University and Northeastern University proposes sparse probing, a technique that probes over 100 features to precisely localize the neurons in large language models that are relevant to a specific feature or concept.
In the new paper Unlimiformer: Long-Range Transformers With Unlimited Length Input, a Carnegie Mellon University research team presents a general approach for improving model performance by augmenting pretrained encoder-decoder transformers with an external datastore to permit inputs of unbounded length.