A Google Research team conducts a systematic exploration comprising more than 4800 experiments on Vision Transformers, MLP-Mixers and ResNets with parameters ranging from 10 million to 10 billion, evaluated on more than 20 downstream image recognition tasks, aiming to capture the nonlinear relationships between performance on upstream and downstream tasks.
A research team proposes ConvMixer, an extremely simple model designed to support the argument that the impressive performance of vision transformers (ViTs) is mainly attributable to their use of patches as the input representation. The study shows that ConvMixer can outperform ViTs, MLP-Mixers and classical vision models.
An Apple research team explores multiple architectures and training procedures to develop a novel multi-speaker and multi-lingual neural TTS system. The study combines speech from 30 speakers from 15 locales in 8 languages, and demonstrates that for the vast majority of voices, such multi-lingual and multi-speaker models can yield better quality than single speaker models.
In a 200+ page paper, Percy Liang, Fei-Fei Li, and over 100 other researchers from the Stanford University Center for Research on Foundation Models (CRFM) systematically describe the opportunities and risks of large-scale pretrained “foundation” models. The unique study aims to provide a clearer understanding of how these models work, when and how they fail, and the various capabilities provided by their emergent properties.
A research team from Università di Firenze, Università di Siena, University of Cambridge and Universitè Côte d’Azur proposes a general approach to explainable artificial intelligence (XAI) in neural architectures, designing interpretable deep learning models called Logic Explained Networks (LENs). The novel approach yields better performance than established white-box models while providing more compact and meaningful explanations.
A research team from the University of Science and Technology of China, Microsoft Cloud AI, City University of Hong Kong and Wormpex AI Research propose a robust and invisible backdoor attack called “Poison Ink” and demonstrates its immunity to state-of-the-art defence techniques.
A research team from Zhejiang University, Wuhan University and Adobe Research proposes Feature Importance-Aware Attacks (FIA) that drastically improve the transferability of adversarial examples, achieving superior performance compared to state-of-the-art transferable attacks.
A DeepMind research team proposes Perceiver IO, a single network that can easily integrate and transform arbitrary information for arbitrary tasks while scaling linearly with both input and output sizes. The general architecture achieves outstanding results on tasks with highly structured output spaces, such as natural language and visual understanding.
A research team from Google Research and Northwestern University presents polynomial time and sample-efficient algorithms for learning an unknown depth-2 feedforward neural network with general ReLU activations, aiming to provide insights into whether efficient algorithms exist for learning ReLU networks.
A research team from Facebook AI and UC Berkeley finds a solution for vision transformers’ optimization instability problem by simply using a standard, lightweight convolutional stem for ViT models. The approach dramatically increases optimizer stability and improves peak performance without sacrificing computation efficiency.
Researchers from Google conduct a survey on how to make Deep Learning models smaller, faster, and better. The team focuses on core areas of model efficiency, from modelling techniques to hardware support, and open-sources an experiment-based guide and code to help practitioners optimize their model training and deployment.
A research team from Mila, McGill University, Université de Montréal, DeepMind and Microsoft proposes GFlowNet, a novel flow network-based generative method that can turn a given positive reward into a generative policy that samples with a probability proportional to the return.v
An IEEE team provides a comprehensive overview of the bottom-up and top-down design approaches toward neuromorphic intelligence, highlighting the different levels of granularity present in existing silicon implementations and assessing the benefits of the different circuit design styles in neural processing systems.
A research team from Google proposes GSPMD, an automatic parallelism system for ML computation graphs that uses simple tensor sharding annotations to achieve different parallelism paradigms in a unified way, including data parallelism, within-layer model parallelism, spatial partitioning, weight-update sharding, optimizer-state sharding and pipeline parallelism.
A research team from ETH Zurich leverages existing spike-based learning circuits to propose a biologically plausible architecture that is highly successful in classifying distinct and complex spatio-temporal spike patterns. The work contributes to the design of ultra-low-power mixed-signal neuromorphic processing systems capable of distinguishing spatio-temporal patterns in spiking activity.