A team from Google Research proposes prediction depth, a new measure of example difficulty determined from hidden embeddings. Their study reveals the surprising fact that the prediction depth of a given input has strong connections to a model’s uncertainty, confidence, accuracy and speed of learning for that data point.
Researchers from Google conduct a survey on how to make Deep Learning models smaller, faster, and better. The team focuses on core areas of model efficiency, from modelling techniques to hardware support, and open-sources an experiment-based guide and code to help practitioners optimize their model training and deployment.
A research team from Facebook AI Research and Mila – McGill University explores deep learning model accuracy versus time trade-offs in anytime learning, which they term Anytime Learning at Macroscale (ALMA). The team evaluates various models to gain insights on how to strike different trade-offs between accuracy and time to obtain a good learner.
A research team from Princeton University, the Institute of Applied Physics and Computational Mathematics and the Beijing Institute of Big Data Research uses the Deep Potential (DP) method to predict the phase diagram of water from ab initio quantum theory, from low temperature and pressure to about 2400 K and 50 GPa. The paper was published in leading physics journal Physical Review Letters and represents an important milestone in the application of DP.
A research team from New York University and Google Research explores whether knowledge distillation really works, showing that a surprisingly large discrepancy often remains between the predictive distributions of the teacher and student models, even when the student has the capacity to perfectly match the teacher.
A research team from Mila, McGill University, Université de Montréal, DeepMind and Microsoft proposes GFlowNet, a novel flow network-based generative method that can turn a given positive reward into a generative policy that samples with a probability proportional to the return.v
A Google Research team proposes MergeDistill, a framework for merging pretrained teacher LMs from multiple monolingual/multilingual LMs into a single multilingual task-agnostic student LM to leverage the capabilities of the powerful language-specific LMs while still being multilingual and enabling positive language transfer.
A research team from McGill University, Université de Montréal, DeepMind and Mila presents an end-to-end, model-based deep reinforcement learning (RL) agent that dynamically attends to relevant parts of its environments to facilitate out-of-distribution (OOD) and systematic generalization.
An IEEE team provides a comprehensive overview of the bottom-up and top-down design approaches toward neuromorphic intelligence, highlighting the different levels of granularity present in existing silicon implementations and assessing the benefits of the different circuit design styles in neural processing systems.
A research team from UC Berkeley, Facebook AI Research and Google Brain abstracts Reinforcement Learning (RL) as a sequence modelling problem. Their proposed Decision Transformer simply outputs optimal actions by leveraging a causally masked transformer, yet matches or exceeds state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.
A research team from Google Brain conducts a comprehensive empirical study on more than fifty choices in a generic adversarial imitation learning framework and explores their impacts on large-scale (>500k trained agents) continuous-control tasks to provide practical insights and recommendations for designing novel and effective AIL algorithms.
A research team from OneFlow and Microsoft takes a step toward automatic deep neural network structure design, exploring unsupervised structure-learning and leveraging the efficient coding principle, information theory and computational neuroscience to design structure learning without label information.
A research team from Google Cloud AI, Google Research and Rutgers University simplifies vision transformers’ complex design, proposing nested transformers (NesT) that simply stack basic transformer layers to process non-overlapping image blocks individually. The approach achieves superior ImageNet classification accuracy and improves model training efficiency.
A research team from New York University, Facebook AI, and a CIFAR Fellow in Learning in Machines & Brains raise doubts regarding large-scale pretrained language models’ few-shot learning abilities. The researchers re-evaluate such abilities with held-out examples unavailable, which they propose constitutes “true few-shot learning.”
A research team from UC Davis, Microsoft Research and Johns Hopkins University extends work on training massive amounts of linguistic data to reveal the grammatical structures in their representations to the domain of mathematical reasoning, showing that both the standard transformer and the TP-Transformer can compose the meanings of mathematical symbols based on their structured relationships.
A research team from the University of Montreal and Max Planck Institute for Intelligent Systems constructs a reinforcement learning agent whose knowledge and reward function can be reused across tasks, along with an attention mechanism that dynamically selects unchangeable knowledge pieces to enable out-of-distribution adaptation and generalization.
A research team from ETH Zürich and Microsoft presents a systematic, comparative study of distributed ML training over serverless infrastructures (FaaS) and “serverful” infrastructures (IaaS), aiming to understand the system tradeoffs of distributed ML training with serverless infrastructures.
A research team from ETH Zürich presents an overview of priors for (deep) Gaussian processes, variational autoencoders and Bayesian neural networks. The researchers propose that well-chosen priors can achieve theoretical and empirical properties such as uncertainty estimation, model selection and optimal decision support; and provide guidance on how to choose them.
A research team from Facebook shows how the power of transfer learning can enable pretraining on non-IDE, non-autocompletion and different-language example code sequences before fine-tuning on the autocompletion prediction task to improve model accuracy by over 50 percent on very small fine-tuning datasets and over 10 percent on 50k labelled examples.
A research team from Google proposes GSPMD, an automatic parallelism system for ML computation graphs that uses simple tensor sharding annotations to achieve different parallelism paradigms in a unified way, including data parallelism, within-layer model parallelism, spatial partitioning, weight-update sharding, optimizer-state sharding and pipeline parallelism.
A research team from DeepMind and Onshape combines a general-purpose language modelling technique and an off-the-shelf data serialization protocol to propose a machine learning model that can automatically generate high-quality sketches for Computer-Aided Design.
A research team from MIT and MIT-IBM Watson AI Lab proposes Curious Representation Learning (CRL), a framework that learns to understand the surrounding environment by training a reinforcement learning (RL) agent to maximize the error of a representation learner to gain an incentive to explore the environment.
A research team from Facebook AI conducts a large-scale study on unsupervised spatiotemporal representation learning from videos. The work takes a unified perspective on four recent image-based frameworks (MoCo, SimCLR, BYOL, SwAV) and investigates a simple objective that can easily generalize unsupervised representation learning methodologies to space-time.
Twitter Chief Scientist Michael Bronstein, Joan Bruna from New York University, Taco Cohen from Qualcomm AI and Petar Veličković from DeepMind publish a paper that aims to geometrically unify the typical architectures of CNNs, GNNs, LSTMs, Transformers, etc. from the perspective of symmetry and invariance to build an “Erlangen Programme” for deep neural networks.
A research team from Huawei Noah’s Ark Lab and Tsinghua University proposes Extract Then Distill (ETD), a generic and flexible strategy for reusing teacher model parameters for efficient and effective task-agnostic distillation that can be applied to student models of any size.