In the new paper YOLOv7: Trainable Bag-Of-Freebies Sets New State-Of-The-Art for Real-Time Object Detectors, an Academia Sinica research team releases YOLOv7. This latest YOLO version introduces novel “extend” and “compound scaling” methods that effectively utilize parameters and computation; and surpasses all known real-time object detectors in speed and accuracy.
The BAAI Conference 2022 kicked off on May 31 in Beijing and ran through June 2. AI experts, industry leaders, young talents and international delegates joined the virtual gathering and live stream for three busy days of high-level keynotes, tech talks, parallel forums and networking.
In the new paper A Modern Self-Referential Weight Matrix That Learns to Modify Itself, a research team from The Swiss AI Lab, IDSIA, University of Lugano (USI) & SUPSI, and King Abdullah University of Science and Technology (KAUST) presents a scalable self-referential weight matrix (SRWM) that leverages outer products and the delta update rule to update and improve itself.
A DeepMind research team argues that the mathematical description of symmetries in group theory is an important foundation that determines the structure of the universe, constrains the nature of natural tasks, and consequently shapes both biological and artificial intelligence. The study proposes symmetry transformations as a fundamental principle for defining what makes good representations.
A DeepMind research team proposes ReLICv2, which demonstrates for the first time that representations learned without labels can consistently outperform a strong, supervised baseline on ImageNet and even achieve comparable results to state-of-the-art self-supervised vision transformers (ViTs).
In the new paper A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More, a research team from MIT, Columbia University, Harvard University and University of Waterloo proposes a neural network that can solve university-level mathematics problems via program synthesis.
In the new paper On the Integration of Self-Attention and Convolution, a research team from Tsinghua University, Huawei Technologies Ltd. and the Beijing Academy of Artificial Intelligence proposes ACmix, a mixed model that leverages the benefits of both self-attention and convolution for computer vision representation tasks while achieving minimum computational overhead compared to its pure convolution or self-attention counterparts.
A research team from Google Research, University of Cambridge and Alan Turing Institute proposes PolyViT, a single transformer model capable of processing multiple modalities and datasets. PolyViT is parameter-efficient and learns representations that generalize across multiple domains.
A research team from the University of Southern California and Google proposes TOME, a “mention memory” approach to factual knowledge extraction for NLU tasks. A transformer model with attention over a semi-parametric representation of the entire Wikipedia text corpus, TOME can extract information without supervision and achieves strong performance on multiple open-domain question answering benchmarks.
A Google Research team explores the design space of Transformer models in an effort to enable deep learning architectures to solve compositional tasks. The proposed approach provides models with inductive biases via design decisions that significantly impact compositional generalization, and achieves state-of-the-art results on the COGS and PCFG composition benchmarks.
A research team from Facebook AI and UC Berkeley finds a solution for vision transformers’ optimization instability problem by simply using a standard, lightweight convolutional stem for ViT models. The approach dramatically increases optimizer stability and improves peak performance without sacrificing computation efficiency.
A research team from McGill University, Université de Montréal, DeepMind and Mila presents an end-to-end, model-based deep reinforcement learning (RL) agent that dynamically attends to relevant parts of its environments to facilitate out-of-distribution (OOD) and systematic generalization.
A research team from ETH Zürich presents an overview of priors for (deep) Gaussian processes, variational autoencoders and Bayesian neural networks. The researchers propose that well-chosen priors can achieve theoretical and empirical properties such as uncertainty estimation, model selection and optimal decision support; and provide guidance on how to choose them.
Twitter Chief Scientist Michael Bronstein, Joan Bruna from New York University, Taco Cohen from Qualcomm AI and Petar Veličković from DeepMind publish a paper that aims to geometrically unify the typical architectures of CNNs, GNNs, LSTMs, Transformers, etc. from the perspective of symmetry and invariance to build an “Erlangen Programme” for deep neural networks.
IBM and ETH Zurich researchers make progress in reconciling neurophysiological insights with machine intelligence, proposing a novel biologically inspired optimizer for artificial (ANNs) and spiking neural networks (SNNs) that incorporates synaptic integration principles from biology. GRAPES (Group Responsibility for Adjusting the Propagation of Error Signals) leads to improvements in the training time convergence, accuracy and scalability of ANNs and SNNs.
A research team from NVIDIA, Stanford University and Microsoft Research propose a novel pipeline parallelism approach that improves throughput by more than 10 percent with a comparable memory footprint, showing such strategies can achieve high aggregate throughput while training models with up to a trillion parameters.
Stanford researchers’ DERL (Deep Evolutionary Reinforcement Learning) is a novel computational framework that enables AI agents to evolve morphologies and learn challenging locomotion and manipulation tasks in complex environments using only low level egocentric sensory information.