Category: Machine Learning & Data Science

AI Computer Vision & Graphics Machine Learning & Data Science Research

Microsoft’s ‘Florence’ General-Purpose Foundation Model Achieves SOTA Results on Dozens of CV Benchmarks

In the paper A New Foundation Model for Computer Vision, a Microsoft research team proposes Florence, a novel foundation model for computer vision that significantly outperforms previous large-scale pretraining approaches and achieves new SOTA results across a wide range of visual and visual-linguistic benchmarks.

AI Machine Learning & Data Science Research

Kwai, Kuaishou & ETH Zürich Propose PERSIA, a Distributed Training System That Supports Deep Learning-Based Recommenders of up to 100 Trillion Parameters

A research team from Kwai Inc., Kuaishou Technology and ETH Zürich builds PERSIA, an efficient distributed training system that leverages a novel hybrid training algorithm to ensure both training efficiency and accuracy for extremely large deep learning recommender systems of up to 100 trillion parameters.

AI Machine Learning & Data Science Research

SPANN: A Highly-Efficient Billion-Scale Approximate Nearest Neighbour Search That’s 2× Faster Than the SOTA Method

A research team from Microsoft, Peking University, Tencent, and Baidu proposes SPANN, a simple but efficient memory-disk hybrid vector indexing and search system that guarantees both low latency and high recall and achieves a 2× speedup over the state-of-the-art nearest neighbour search (ANNS) solution while retaining the same recall quality and memory cost.

AI Machine Learning & Data Science Research

Is BERT the Future of Image Pretraining? ByteDance Team’s BERT-like Pretrained Vision Transformer iBOT Achieves New SOTAs

A research team from ByteDance, Johns Hopkins University, Shanghai Jiao Tong University and UC Santa Cruz seeks to apply the proven technique of masked language modelling to the training of better vision transformers, presenting iBOT (image BERT pretraining with Online Tokenizer), a self-supervised framework that performs masked prediction with an online tokenizer.

AI Machine Learning & Data Science Research

Google Brain & Radboud U ‘Dive Into Chaos’ to Show Gradients Are Not All You Need in Dynamical Systems

In the new paper Gradients Are Not All You Need, a Google Brain and Radboud University research team discusses a “particularly sinister” chaos-based failure mode that appears in a variety of differentiable circumstances, ranging from recurrent neural networks and numerical physics simulation to training learned optimizers.

AI Machine Learning & Data Science Research

Can ViT Layers Express Convolutions? Peking U, UCLA & Microsoft Researchers Say ‘Yes’

In the new paper Can Vision Transformers Perform Convolution?, a research team from Peking University, UCLA and Microsoft Research proves that a single ViT layer with image patches as the input can perform any convolution operation constructively, and show that ViT performance in low data regimes can be significantly improved using their proposed ViT training pipeline.

AI Machine Learning & Data Science Nature Language Tech Research

Introducing MetaICL: A Language Model Meta-Training Framework for Few-Shot In-Context Learning

A research team from the University of Washington, Facebook AI Research and the Allen Institute for AI introduces Meta-training for InContext Learning (MetaICL), a new meta-training framework for few-shot learning where an LM is meta-trained to learn in-context — conditioning on training examples to recover the task and make predictions.

AI Machine Learning & Data Science Research

Google & UC Berkeley’s Data-Driven Offline Optimization Approach Significantly Boosts Hardware Accelerator Performance, Reduces Simulation Time by More Than 90%

A research team from Google Research and UC Berkeley proposes PRIME, an offline data-driven approach that can architect hardware accelerators without any form of simulations. Compared to state-of-the-art simulation-driven methods, PRIME achieves impressive performance improvements of up to 1.54× while reducing the total required simulation time by up to 99 percent.

AI Machine Learning & Data Science Research

Washington U & Google Study Reveals How Attention Matrices Are Formed in Encoder-Decoder Architectures

In the new paper Understanding How Encoder-Decoder Architectures Attend, researchers from the University of Washington, Google Blueshift Team and Google Brain Team propose a method for decomposing hidden states over a sequence into temporal- and input-driven components, revealing how attention matrices are formed in encoder-decoder networks.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Softmax-free Vision Transformer With Linear Complexity: Achieving a Superior Accuracy/Complexity Trade-off

Researchers from Fudan University, University of Surrey and Huawei Noah’s Ark Lab identify the limitations of quadratic complexity for vision transformers (ViTs) as rooted in keeping the softmax self-attention during approximations. The team proposes the first softmax-free transformer (SOFT), which reduces the self-attention computation to linear complexity, achieving a superior trade-off between accuracy and complexity.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Google Open-Sources SCENIC: A JAX Library for Rapid Computer Vision Model Prototyping and Cutting-Edge Research

A research team from Google Brain and Google Research introduces SCENIC, an open-source JAX library for fast and extensible computer vision research and beyond. JAX currently supports implementations of state-of-the-art vision models such as ViT, DETR and MLP Mixer, and more open-sourced cutting-edge projects will be added in the near future.

AI Machine Learning & Data Science Research

Deeper Is Not Necessarily Better: Princeton U & Intel’s 12-Layer Parallel Networks Achieve Performance Competitive With SOTA Deep Networks

In the new paper Non-deep Networks, a research team from Princeton University and Intel Labs argues it is possible to achieve high performance with “non-deep” neural networks, presenting ParNet (Parallel Networks), a novel 12-layer architecture that achieves performance competitive with its state-of-the-art deep counterparts.

AI Machine Learning & Data Science Nature Language Tech Popular Research

Mention Memory: Incorporating Factual Knowledge From Various Sources Into Transformers Without Supervision

A research team from the University of Southern California and Google proposes TOME, a “mention memory” approach to factual knowledge extraction for NLU tasks. A transformer model with attention over a semi-parametric representation of the entire Wikipedia text corpus, TOME can extract information without supervision and achieves strong performance on multiple open-domain question answering benchmarks.

AI Machine Learning & Data Science Research

NVIDIA’s StyleGAN3 Is Fully Equivariant to Translation and Rotation, Improving GAN-Based Animation Generation

A NVIDIA and Aalto University research team presents StyleGAN3, a novel generative adversarial network (GAN) architecture where the exact sub-pixel position of each feature is exclusively inherited from the underlying coarse features, enabling a more natural transformation hierarchy and advancing GAN-based animation generation.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Are Patches All You Need? New Study Proposes Patches Are Behind Vision Transformers’ Strong Performance

A research team proposes ConvMixer, an extremely simple model designed to support the argument that the impressive performance of vision transformers (ViTs) is mainly attributable to their use of patches as the input representation. The study shows that ConvMixer can outperform ViTs, MLP-Mixers and classical vision models.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Apple Study Reveals the Learned Visual Representation Similarities and Dissimilarities Between Self-Supervised and Supervised Methods

An Apple research team performs a comparative analysis on a contrastive self-supervised learning (SSL) algorithm (SimCLR) and a supervised learning (SL) approach for simple image data in a common architecture, shedding light on the similarities and dissimilarities in their learned visual representation patterns.