In the new paper Mnemosyne: Learning to Train Transformers with Transformers, a research team from Google and Columbia University presents Mnemosyne Optimizer, a learning-to-learn system for training entire neural network architectures without any task-specific optimizer tuning.
In the new paper Mathematical Capabilities of ChatGPT, an international research team tests ChatGPT’s mathematical capabilities and evaluates its suitability as an assistant to professional mathematicians. The team concludes that despite the glowing reviews in mainstream media, ChatGPT’s mathematical abilities “are significantly below those of an average mathematics graduate student.”
In the new paper DetectGPT: Zero-Shot Machine-Generated Text Detection Using Probability Curvature, a Stanford University research team presents DetectGPT, a zero-shot machine-generated text detection algorithm that uses probability curvature to predict whether a candidate passage was generated by a large language model.
In the new paper MusicLM: Generating Music From Text, a Google Research and Sorbonne University team presents MusicLM, a model for generating high-fidelity music that can be conditioned on both text and melody. MusicLM surpasses baselines in both its audio quality and adherence to the text descriptions.
In the new paper ClimaX: A Foundation Model for Weather and Climate, a team from Microsoft Autonomous Systems and Robotics Research, Microsoft Research AI4Science and the University of California at Los Angeles presents ClimaX, a foundation model for weather and climate that can be efficiently adapted for general-purpose tasks related to the Earth’s atmosphere.
A Stanford University research team presents a brain-computer interface for translating speech-related neural activity into text (speech BCI) in the new paper A High-performance Speech Neuroprosthesis. Theirs is the first speech BCI to record impulse activity from intracortical microelectrode arrays and could benefit people unable to produce clear utterances due to diseases such as stroke and ALS.
In the new paper Asynchronous Deep Double Duelling Q-Learning for Trading-Signal Execution in Limit Order Book Markets, an Oxford University research team introduces Deep Duelling Double Q-learning with the APEX architecture to train a trading agent to translate predictive signals into optimal limit order trading strategies.
In the new paper Continual Few-Shot Learning Using HyperTransformers, a Google Research team proposes Continual HyperTransformer, which modifies the recently published HyperTransformer few-shot learning method to sequentially update a convolutional neural network’s weights based on the information in a new task without forgetting the knowledge it learned from previous tasks.
In the new paper Tracr: Compiled Transformers as a Laboratory for Interpretability, a research team from ETH Zurich and DeepMind presents Tracr, a compiler that addresses the absence of ground truth explanations in deep neural network models by “compiling” human readable code to the weights of a transformer model.
In the new paper Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling, a research team from Peking University, ByteDance, and the University of Oxford presents Sparse Masked Modelling with Hierarchy (SparK), the first BERT-style pretraining approach that can be used on convolutional models without any backbone modifications.
In the new paper Memory Augmented Large Language Models are Computationally Universal, Google Brain and University of Alberta researcher Dale Schuurmans establishes computational universality for a large language model augmented with an associative read-write memory.
In the new paper Unlocking de Novo Antibody Design With Generative Artificial Intelligence, researchers from Absci Corporation leverage the power of generative artificial intelligence for de novo antibody design in a zero-shot and controllable manner, dramatically reducing time and resource requirements for the task.
In the new paper Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers, a Microsoft research team presents VALL-E, the first language model-based text-to-speech (TTS) system with strong in-context learning. VALL-E achieves state-of-the-art personalized speech synthesis quality via prompting in a zero-shot setting.
Baidu, Inc. today hosted its annual flagship developer conference Baidu Create 2022. In the meeting, Baidu offered an in-depth exploration of Baidu’s research and analysis of future technology trends, covering a range of emerging technologies including artificial intelligence, autonomous driving, intelligent search, quantum computing and AI scientific computing.
In the new paper Muse: Text-To-Image Generation via Masked Generative Transformers, a Google Research team introduces Muse, a transformer-based text-to-image synthesis model that leverages masked image modelling to achieve state-of-the-art performance while being significantly faster than diffusion or autoregressive models.
Colossal-AI (https://github.com/hpcaitech/ColossalAI), the widely-used open-source library for training, inference and fine-tuning of large deep learning models, has released a new automatic parallelism feature and functionality that reduces hardware costs by up to 46 times for AI-Generate Content (AIGC) solutions.
In the new paper Hungry Hungry Hippos: Towards Language Modeling with State Space Models, Stanford University and State University of New York at Buffalo researchers explore the expressivity gap between state space models and transformer language model attention mechanisms and propose FlashConv to improve state space model training efficiency on modern hardware.
In the new paper OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization, a Meta AI research team presents OPT-IML Bench, an Instruction Meta Learning benchmark comprising 2000 NLP tasks and an evaluation framework for model generalization.
In the new paper GraphCast: Learning Skillful Medium-Range Global Weather Forecasting, a research team from DeepMind and Google presents GraphCast, a machine-learning (ML)-based weather simulator that scales well with data and can generate a 10-day forecast in under 60 seconds. GraphCast outperforms the world’s most accurate deterministic operational medium-range weather forecasting system and betters existing ML-based benchmarks.
In the new paper Point-E: A System for Generating 3D Point Clouds from Complex Prompts, An OpenAI research team presents Point·E, a system for text-conditional synthesis of 3D point clouds that leverages diffusion models to generate diverse and complex 3D shapes conditioned on complex text prompts in minutes on a single GPU.
In the new paper Structured Prompting: Scaling In-Context Learning to 1,000 Examples, a Microsoft Research team proposes structured prompting. The novel approach breaks through conventional in-context learning length limits, scaling to thousands of examples with reduced computation complexity and superior performance and stability.
In the new paper The Alignment Problem From a Deep Learning Perspective, a research team from OpenAI, UC Berkeley and the University of Oxford examines the alignment problem with regard to deep learning, identifying potential issues and how we might mitigate them.
In the new paper What Do Vision Transformers Learn? A Visual Exploration, a research team from the University of Maryland and New York University uses large-scale feature visualizations from a wide range of vision transformers to gain insights into what they learn from images and how they differ from convolutional neural networks.
In the new paper Text Embeddings by Weakly-Supervised Contrastive Pre-training, a Microsoft research team introduces Embeddings from Bidirectional Encoder Representations (E5), a general-purpose text embedding model for tasks requiring a single-vector representation of texts and the first model to surpass the BM25 baseline on the BEIR retrieval benchmark under a zero-shot setting.
In the new paper The Stack: 3 TB of Permissively Licensed Source Code, a team from ServiceNow Research and Hugging Face advances open and responsible research on code LLMs by releasing The Stack, a 3.1 TB dataset of permissively licensed source code in 30 programming languages.
Turing Award winner and deep learning pioneer Geoffrey Hinton, one of the original proponents of backpropagation, has argued in recent years that backpropagation does not explain how the brain works. In his NeurIPS 2022 keynote speech, Hinton proposes a new approach to neural network learning: the Forward-Forward algorithm.
In the new paper Transformer-Based Learned Optimization, a Google Research and Lund University team presents Optimus, an expressive neural network architecture for learned optimization that captures complex dependencies in the parameter space and achieves competitive results on real-world tasks and benchmark optimization problems.
In the new paper Fine-tuning Language Models To Find Agreement Among Humans With Diverse Preferences, a research team from DeepMind and University College London fine-tunes a 70 billion parameter language model to generate statements that maximize agreement among a human group with diverse written opinions.
In the new paper Compressing Volumetric Radiance Fields to 1 MB, an Alibaba Group research team proposes vector quantized radiance fields (VQRF), a simple yet efficient framework for compressing volumetric radiance fields that achieves up to 100x storage reduction, reducing original grid model size to around 1 MB with negligible loss on rendering quality.
In the new paper Convexifying Transformers: Improving Optimization and Understanding of Transformer Networks, a Stanford University and Google Research team provides a solid theoretical analysis of transformers’ fundamental mechanisms and introduces a novel convex analytic training framework for improving their optimization.
In the new paper Solving Math Word Problems With Process- and Outcome-based Feedback, a DeepMind research team conducts the first comprehensive comparison between process- and outcome-based model supervision. The two approaches achieve comparable final-answer error rate improvements on math word problems, while the process-based method significantly reduces reasoning errors from 14.0 to just 3.4 percent.
In the new paper I Can’t Believe There’s No Images! Learning Visual Tasks Using only Language Data, an Allen Institute for Artificial Intelligence team proposes Cross Modal Transfer On Semantic Embeddings (CLOSE), an approach that learns high-level skills from textual data, then uses these skills to complete vision tasks without additional visual training data.
In the NeurIPS 2022 Outstanding Paper Beyond Neural Scaling Laws: Beating Power Law Scaling via Data Pruning, a research team from Stanford University, University of Tübingen and Meta AI demonstrates in theory and practice how data pruning techniques can break beyond the power law scaling of error versus dataset size.
In the NeurIPS 2022 Outstanding Paper Gradient Descent: The Ultimate Optimizer, MIT CSAIL and Meta researchers present a novel technique that enables gradient descent optimizers such as SGD and Adam to tune their hyperparameters automatically. The method requires no manual differentiation and can be stacked recursively to many levels.