A research team from Facebook shows how the power of transfer learning can enable pretraining on non-IDE, non-autocompletion and different-language example code sequences before fine-tuning on the autocompletion prediction task to improve model accuracy by over 50 percent on very small fine-tuning datasets and over 10 percent on 50k labelled examples.
A research team from Google proposes GSPMD, an automatic parallelism system for ML computation graphs that uses simple tensor sharding annotations to achieve different parallelism paradigms in a unified way, including data parallelism, within-layer model parallelism, spatial partitioning, weight-update sharding, optimizer-state sharding and pipeline parallelism.
A research team from DeepMind and Onshape combines a general-purpose language modelling technique and an off-the-shelf data serialization protocol to propose a machine learning model that can automatically generate high-quality sketches for Computer-Aided Design.
A research team from MIT and MIT-IBM Watson AI Lab proposes Curious Representation Learning (CRL), a framework that learns to understand the surrounding environment by training a reinforcement learning (RL) agent to maximize the error of a representation learner to gain an incentive to explore the environment.
A research team from Facebook AI conducts a large-scale study on unsupervised spatiotemporal representation learning from videos. The work takes a unified perspective on four recent image-based frameworks (MoCo, SimCLR, BYOL, SwAV) and investigates a simple objective that can easily generalize unsupervised representation learning methodologies to space-time.
Twitter Chief Scientist Michael Bronstein, Joan Bruna from New York University, Taco Cohen from Qualcomm AI and Petar Veličković from DeepMind publish a paper that aims to geometrically unify the typical architectures of CNNs, GNNs, LSTMs, Transformers, etc. from the perspective of symmetry and invariance to build an “Erlangen Programme” for deep neural networks.
A research team from Huawei Noah’s Ark Lab and Tsinghua University proposes Extract Then Distill (ETD), a generic and flexible strategy for reusing teacher model parameters for efficient and effective task-agnostic distillation that can be applied to student models of any size.
Researchers from Carnegie Mellon University, the University of Texas at Austin and Facebook AI propose a novel paradigm to optimize widths for each CNN layer. The method is compatible across various width optimization algorithms and networks and achieves up to a 320x reduction in width optimization overhead without compromising top-1 accuracy on ImageNet.
IBM and ETH Zurich researchers make progress in reconciling neurophysiological insights with machine intelligence, proposing a novel biologically inspired optimizer for artificial (ANNs) and spiking neural networks (SNNs) that incorporates synaptic integration principles from biology. GRAPES (Group Responsibility for Adjusting the Propagation of Error Signals) leads to improvements in the training time convergence, accuracy and scalability of ANNs and SNNs.
A research team from Google Research proposes small, fast, on-device disfluency detection models based on the BERT architecture. The smallest model size is only 1.3 MiB, representing a size reduction of two orders of magnitude and an inference latency reduction of a factor of eight compared to state-of-the-art BERT-based models.
A research team from Google and the University of California, Berkeley calculates the energy use and carbon footprint of large-scale models T5, Meena, GShard, Switch Transformer and GPT-3, and identifies methods and publication guidelines that could help reduce their CO2e footprint.
A research team from McGill University, Mila – Quebec AI Institute and Facebook AI proposes novel metrics and perturbation functions to detect, quantify and compare trade-offs between robustness and faithfulness in NMT systems, both on the corpus level and with particular examples.
On April 19, Google Cloud and Siemens announced new cooperation to optimize factory processes and improve productivity on the shop floor. Siemens intends to integrate Google Cloud’s leading data cloud and artificial intelligence/machine learning (AI/ML) technologies with its factory automation solutions.
A research team from ETH Zurich leverages existing spike-based learning circuits to propose a biologically plausible architecture that is highly successful in classifying distinct and complex spatio-temporal spike patterns. The work contributes to the design of ultra-low-power mixed-signal neuromorphic processing systems capable of distinguishing spatio-temporal patterns in spiking activity.
A research team from NVIDIA, Stanford University and Microsoft Research propose a novel pipeline parallelism approach that improves throughput by more than 10 percent with a comparable memory footprint, showing such strategies can achieve high aggregate throughput while training models with up to a trillion parameters.
A research team from ETH and UC Berkeley proposes a Deep Reward Learning by Simulating the Past (Deep RLSP) algorithm that represents rewards directly as a linear combination of features learned through self-supervised representation learning and enables agents to simulate human actions backwards in time to infer what they must have done.
A research team from IBM introduces two systems for predicting information type: The TypeSuggest module, an unsupervised system designed to generate types for a set of seed query terms input by the user; and an Answer Type prediction module for predicting the correct answer type for user-provided questions.
A research team from Technical University of Munich, Google, Nvidia and LMU München proposes CodeTrans, an encoder-decoder transformer model which achieves state-of-the-art performance on six tasks in the software engineering domain, including Code Documentation Generation, Source Code Summarization, Code Comment Generation, etc.