A research team from DeepMind and Onshape combines a general-purpose language modelling technique and an off-the-shelf data serialization protocol to propose a machine learning model that can automatically generate high-quality sketches for Computer-Aided Design.
A research team from MIT and MIT-IBM Watson AI Lab proposes Curious Representation Learning (CRL), a framework that learns to understand the surrounding environment by training a reinforcement learning (RL) agent to maximize the error of a representation learner to gain an incentive to explore the environment.
A research team from Facebook AI conducts a large-scale study on unsupervised spatiotemporal representation learning from videos. The work takes a unified perspective on four recent image-based frameworks (MoCo, SimCLR, BYOL, SwAV) and investigates a simple objective that can easily generalize unsupervised representation learning methodologies to space-time.
Twitter Chief Scientist Michael Bronstein, Joan Bruna from New York University, Taco Cohen from Qualcomm AI and Petar Veličković from DeepMind publish a paper that aims to geometrically unify the typical architectures of CNNs, GNNs, LSTMs, Transformers, etc. from the perspective of symmetry and invariance to build an “Erlangen Programme” for deep neural networks.
A research team from Huawei Noah’s Ark Lab and Tsinghua University proposes Extract Then Distill (ETD), a generic and flexible strategy for reusing teacher model parameters for efficient and effective task-agnostic distillation that can be applied to student models of any size.
Researchers from Carnegie Mellon University, the University of Texas at Austin and Facebook AI propose a novel paradigm to optimize widths for each CNN layer. The method is compatible across various width optimization algorithms and networks and achieves up to a 320x reduction in width optimization overhead without compromising top-1 accuracy on ImageNet.
IBM and ETH Zurich researchers make progress in reconciling neurophysiological insights with machine intelligence, proposing a novel biologically inspired optimizer for artificial (ANNs) and spiking neural networks (SNNs) that incorporates synaptic integration principles from biology. GRAPES (Group Responsibility for Adjusting the Propagation of Error Signals) leads to improvements in the training time convergence, accuracy and scalability of ANNs and SNNs.
A research team from Google Research proposes small, fast, on-device disfluency detection models based on the BERT architecture. The smallest model size is only 1.3 MiB, representing a size reduction of two orders of magnitude and an inference latency reduction of a factor of eight compared to state-of-the-art BERT-based models.
A research team from Google and the University of California, Berkeley calculates the energy use and carbon footprint of large-scale models T5, Meena, GShard, Switch Transformer and GPT-3, and identifies methods and publication guidelines that could help reduce their CO2e footprint.
A research team from McGill University, Mila – Quebec AI Institute and Facebook AI proposes novel metrics and perturbation functions to detect, quantify and compare trade-offs between robustness and faithfulness in NMT systems, both on the corpus level and with particular examples.
A research team from ETH Zurich leverages existing spike-based learning circuits to propose a biologically plausible architecture that is highly successful in classifying distinct and complex spatio-temporal spike patterns. The work contributes to the design of ultra-low-power mixed-signal neuromorphic processing systems capable of distinguishing spatio-temporal patterns in spiking activity.
A research team from NVIDIA, Stanford University and Microsoft Research propose a novel pipeline parallelism approach that improves throughput by more than 10 percent with a comparable memory footprint, showing such strategies can achieve high aggregate throughput while training models with up to a trillion parameters.
A research team from ETH and UC Berkeley proposes a Deep Reward Learning by Simulating the Past (Deep RLSP) algorithm that represents rewards directly as a linear combination of features learned through self-supervised representation learning and enables agents to simulate human actions backwards in time to infer what they must have done.
A research team from IBM introduces two systems for predicting information type: The TypeSuggest module, an unsupervised system designed to generate types for a set of seed query terms input by the user; and an Answer Type prediction module for predicting the correct answer type for user-provided questions.
A research team from Technical University of Munich, Google, Nvidia and LMU München proposes CodeTrans, an encoder-decoder transformer model which achieves state-of-the-art performance on six tasks in the software engineering domain, including Code Documentation Generation, Source Code Summarization, Code Comment Generation, etc.
A team from University of Michigan, MIT-IBM Watson AI Lab and ShanghaiTech University publishes two papers on individual fairness for ML models, introducing a scale-free and interpretable statistically principled approach for assessing individual fairness and a method for enforcing individual fairness in gradient boosting suitable for non-smooth ML models.
A research team from Princeton University and Microsoft Research discover autonomous language-understanding agents are capable of achieving high scores even in the complete absence of language semantics, indicating that current RL agents for text-based games might not be sufficiently leveraging the semantic structure of game texts.
A research team from DeepMind and Alberta University proposes Policy-guided Heuristic Search (PHS), a novel search algorithm that uses both a heuristic function and a policy while offering guarantees on the search loss that relate to both the quality of the heuristic and the policy.
Tsinghua & MIT researchers break the stereotype that GPTs can generate but not understand language, showing that GPTs can compete with BERT models on natural language understanding tasks using a novel P-tuning method that can also improve BERT performance in both few-shot and supervised settings.