In the new paper The Alignment Problem From a Deep Learning Perspective, a research team from OpenAI, UC Berkeley and the University of Oxford examines the alignment problem with regard to deep learning, identifying potential issues and how we might mitigate them.
In the new paper DeepDPM: Deep Clustering With an Unknown Number of Clusters, a research team from the Ben-Gurion University of the Negev presents DeepDPM, an effective deep nonparametric approach that removes the need to predefine the number of clusters in clustering tasks and can infer it instead.
In the new paper DataMUX: Data Multiplexing for Neural Networks, a Princeton University research team proposes Data Multiplexing (DataMUX). The novel technique enables neural networks to process multiple inputs simultaneously and generate accurate predictions, increasing model throughput with minimal additional memory requirements.
A Google Brain research team introduces EvoJAX, a JAX-based, scalable, general-purpose, hardware-accelerated neuroevolution toolkit that enables neuroevolution algorithms to work with neural networks running in parallel across multiple TPU/GPUs and achieves significant training speedups.
A research team from UC Berkeley, Amazon Web Services, Google, Shanghai Jiao Tong University and Duke University proposes Alpa, a compiler system for distributed deep learning on GPU clusters that automatically generates parallelization plans that match or outperform hand-tuned model-parallel training systems even on the models they were designed for.
A research team from University Medical Center Freiburg, ML Collective, and Google Brain introduces SimpleBits — an information-reduction method that learns to synthesize simplified inputs that contain less information yet remain informative for the task, providing a new approach for exploring the basis of network decisions.
A Microsoft research team proposes DeepSpeed-MoE, comprising a novel MoE architecture design and model compression technique that reduces MoE model size by up to 3.7x and a highly optimized inference system that provides 7.3x better latency and cost compared to existing MoE inference solutions.
Peng Cheng Laboratory (PCL) and Baidu release PCL-BAIDU Wenxin, the world’s first knowledge-enhanced 100-billion-scale pretrained language model and the largest Chinese-language monolithic model with 260 billion parameters. PCL-BAIDU Wenxin achieves state-of-the-art results on more than 60 tasks and significantly advances more than 30 benchmarks for zero-shot and few-shot learning.
DeepMind and Google Brain researchers and former World Chess Champion Vladimir Kramnik explore how human knowledge is acquired and how chess concepts are represented in the AlphaZero neural network via concept probing, behavioural analysis, and an examination of its activations.
A DeepMind research team presents the One Pass ImageNet (OPIN) problem, designed to study the space and compute efficiency of deep learning in a streaming setting with constrained data storage and to develop model training systems where each example is passed to the system only once.
A Microsoft Research India team presents Varuna, a system for training massive deep learning models on commodity networking that eliminates the need for specialized hyperclusters and alleviates the cost, scale, and resource utilization challenges of deep learning model training.
In the new paper Understanding How Encoder-Decoder Architectures Attend, researchers from the University of Washington, Google Blueshift Team and Google Brain Team propose a method for decomposing hidden states over a sequence into temporal- and input-driven components, revealing how attention matrices are formed in encoder-decoder networks.
In the new paper Non-deep Networks, a research team from Princeton University and Intel Labs argues it is possible to achieve high performance with “non-deep” neural networks, presenting ParNet (Parallel Networks), a novel 12-layer architecture that achieves performance competitive with its state-of-the-art deep counterparts.
A Google AI research team explores zero-label learning (training with synthetic data only) in natural language processing, and introduces Unsupervised Data Generation (UDG), a training data creation procedure designed to synthesize high-quality training data without human annotations.
In a 200+ page paper, Percy Liang, Fei-Fei Li, and over 100 other researchers from the Stanford University Center for Research on Foundation Models (CRFM) systematically describe the opportunities and risks of large-scale pretrained “foundation” models. The unique study aims to provide a clearer understanding of how these models work, when and how they fail, and the various capabilities provided by their emergent properties.
A research team from Università di Firenze, Università di Siena, University of Cambridge and Universitè Côte d’Azur proposes a general approach to explainable artificial intelligence (XAI) in neural architectures, designing interpretable deep learning models called Logic Explained Networks (LENs). The novel approach yields better performance than established white-box models while providing more compact and meaningful explanations.
On August 5, WeChat AI and Beijing Jiaotong University system developers released the paper WeChat Neural Machine Translation Systems for WMT21, revealing the architecture of their novel neural machine translation (NMT) system and the strategies they adopted to achieve impressive performance in the WMT21 competition.
A research team from Zhejiang University, Wuhan University and Adobe Research proposes Feature Importance-Aware Attacks (FIA) that drastically improve the transferability of adversarial examples, achieving superior performance compared to state-of-the-art transferable attacks.
A DeepMind research team proposes Perceiver IO, a single network that can easily integrate and transform arbitrary information for arbitrary tasks while scaling linearly with both input and output sizes. The general architecture achieves outstanding results on tasks with highly structured output spaces, such as natural language and visual understanding.
A research team from Google Research and Northwestern University presents polynomial time and sample-efficient algorithms for learning an unknown depth-2 feedforward neural network with general ReLU activations, aiming to provide insights into whether efficient algorithms exist for learning ReLU networks.
A team from Google Research proposes prediction depth, a new measure of example difficulty determined from hidden embeddings. Their study reveals the surprising fact that the prediction depth of a given input has strong connections to a model’s uncertainty, confidence, accuracy and speed of learning for that data point.
Researchers from Google conduct a survey on how to make Deep Learning models smaller, faster, and better. The team focuses on core areas of model efficiency, from modelling techniques to hardware support, and open-sources an experiment-based guide and code to help practitioners optimize their model training and deployment.
A research team from ETH Zürich presents an overview of priors for (deep) Gaussian processes, variational autoencoders and Bayesian neural networks. The researchers propose that well-chosen priors can achieve theoretical and empirical properties such as uncertainty estimation, model selection and optimal decision support; and provide guidance on how to choose them.
Twitter Chief Scientist Michael Bronstein, Joan Bruna from New York University, Taco Cohen from Qualcomm AI and Petar Veličković from DeepMind publish a paper that aims to geometrically unify the typical architectures of CNNs, GNNs, LSTMs, Transformers, etc. from the perspective of symmetry and invariance to build an “Erlangen Programme” for deep neural networks.