A research team from Peking University, ETH Zürich and Kuaishou Technology proposes Hyper-Tune, an efficient and robust distributed hyperparameter-tuning framework that features system optimizations such as automatic resource allocation, asynchronous scheduling and a multi-fidelity optimizer, and achieves state-of-the-art performance on multiple tuning tasks.
A research team from University Medical Center Freiburg, ML Collective, and Google Brain introduces SimpleBits — an information-reduction method that learns to synthesize simplified inputs that contain less information yet remain informative for the task, providing a new approach for exploring the basis of network decisions.
A Microsoft research team proposes DeepSpeed-MoE, comprising a novel MoE architecture design and model compression technique that reduces MoE model size by up to 3.7x and a highly optimized inference system that provides 7.3x better latency and cost compared to existing MoE inference solutions.
A DeepMind research team proposes ReLICv2, which demonstrates for the first time that representations learned without labels can consistently outperform a strong, supervised baseline on ImageNet and even achieve comparable results to state-of-the-art self-supervised vision transformers (ViTs).
A research team from Rensselaer Polytechnic Institute, Thomas J. Watson Research Center and the University of California, Los Angeles proposes a novel framework for effective pretrained neural network model selection for downstream tasks that forecasts the predictive ability of a model with its cumulative information in the early phase of neural network training.
A team from Facebook AI Research and UC Berkeley proposes ConvNeXts, a pure ConvNet model that achieves performance comparable with state-of-the-art hierarchical vision transformers on computer vision benchmarks while retaining the simplicity and efficiency of standard ConvNets.
A research team from Google, Purdue University and Harvard University presents CFU Playground, a full-stack open-source framework for the rapid and iterative design of accelerators for embedded ML systems, enabling developers with minimal FPGA and hardware experience to achieve model speedups of up to 75x.
PhD electronic researcher Ildar Rakhmatulin and brain-computer interface developer Sebastian Völkl open-source an inexpensive, high-precision, easy-to-maintain PIEEG board that can convert a Raspberry Pi into a brain-computer interface for measuring and processing eight real-time EEG (Electroencephalography) signals.
A team from Google Research, University of Pennsylvania and Cornell University proposes a principled perspective to filter out common memorization for LMs, introducing “counterfactual memorization” to measure the expected change in a model’s prediction and distinguish “rare” (episodic) memorization from “common” (semantic) memorization in neural LMs.
Baidu researchers propose ERNIE-ViLG, a 10-billion parameter scale pretraining framework for bidirectional text-image generation. Pretrained on 145 million (Chinese) image-text pairs, ERNIE-ViLG achieves state-of-the-art performance on both text-to-image and image-to-text generation tasks.
A research team from Yale and IBM presents Kernel Graph Neural Networks (KerGNNs), which integrate graph kernels into the message passing process of GNNs in one framework, achieving performance comparable to state-of-the-art methods and significantly improving model interpretability compared with conventional GNNs.
In the new paper A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More, a research team from MIT, Columbia University, Harvard University and University of Waterloo proposes a neural network that can solve university-level mathematics problems via program synthesis.
A research from the Fujitsu AI Laboratory, the University of Tokyo and the RIKEN Center for Advanced Intelligence Project proposes a modularization method that decomposes a DNN into small modules from a functionality perspective and recomposes them into new models appropriate for other tasks.
In the new paper Masked Feature Prediction for Self-Supervised Visual Pre-Training, a Facebook AI Research and Johns Hopkins University team presents a novel Masked Feature Prediction (MaskFeat) approach for the self-supervised pretraining of video models that achieves SOTA results on video benchmarks.
An OpenAI research team fine-tunes the GPT-3 pretrained language model to enable it to answer long-form questions by searching and navigating a text-based web browsing environment, achieving retrieval and synthesis improvements and reaching human-level long-form question-answering performance.
A Facebook AI Research team presents FLAVA, a foundational language and vision alignment model that explicitly targets language, vision, and their multimodal combination all at once, achieving impressive performance on 35 tasks across the vision, language, and multimodal domains.
In the new paper Self-attention Does Not Need O(n2) Memory, a Google Research team presents novel and simple algorithms for attention and self-attention that require only constant memory and logarithmic memory and reduce the self-attention memory overhead by 59x for inference and by 32x for differentiation at a sequence length of 16384.
A DeepMind research team proposes RETRO (Retrieval-Enhanced Transformer), an enhanced auto-regressive language model that conditions on document chunks retrieved from a large corpus and achieves performance comparable to GPT-3 and Jurassic-1 on the Pile dataset while using 25× fewer parameters.
Peng Cheng Laboratory (PCL) and Baidu release PCL-BAIDU Wenxin, the world’s first knowledge-enhanced 100-billion-scale pretrained language model and the largest Chinese-language monolithic model with 260 billion parameters. PCL-BAIDU Wenxin achieves state-of-the-art results on more than 60 tasks and significantly advances more than 30 benchmarks for zero-shot and few-shot learning.
DeepMind researchers introduce Player of Games (PoG), a general-purpose algorithm that applies self-play learning, search, and game-theoretic reasoning to perfect and imperfect information games, taking an important step toward truly general algorithms for arbitrary environments.
In the new paper Understanding the World Through Action, UC Berkeley assistant professor in the department of electrical engineering and computer sciences Sergey Levine argues that a general, principled, and powerful framework for utilizing unlabelled data can be derived from reinforcement learning to enable machine learning systems leveraging large datasets to understand the real world.
In the new paper On the Integration of Self-Attention and Convolution, a research team from Tsinghua University, Huawei Technologies Ltd. and the Beijing Academy of Artificial Intelligence proposes ACmix, a mixed model that leverages the benefits of both self-attention and convolution for computer vision representation tasks while achieving minimum computational overhead compared to its pure convolution or self-attention counterparts.
In the new paper Sparse is Enough in Scaling Transformers, a research team from the University of Warsaw, Google Research and OpenAI proposes Scaling Transformers, a family of novel transformers that leverage sparse layers to scale efficiently and perform unbatched decoding much faster than original transformers, enabling fast inference on long sequences even with limited memory.
A team from Google Research, Stanford University, University of Massachusetts, University of California, Columbia University, Princeton University, Max Planck Institute for the Physics of Complex Systems and University of Oxford uses a quantum processor to observe a time crystal, a new phase of matter which could be one of the most significant physical discoveries in decades.
A research team from Google Research, University of Cambridge and Alan Turing Institute proposes PolyViT, a single transformer model capable of processing multiple modalities and datasets. PolyViT is parameter-efficient and learns representations that generalize across multiple domains.
In the paper A New Foundation Model for Computer Vision, a Microsoft research team proposes Florence, a novel foundation model for computer vision that significantly outperforms previous large-scale pretraining approaches and achieves new SOTA results across a wide range of visual and visual-linguistic benchmarks.
A research team from Kwai Inc., Kuaishou Technology and ETH Zürich builds PERSIA, an efficient distributed training system that leverages a novel hybrid training algorithm to ensure both training efficiency and accuracy for extremely large deep learning recommender systems of up to 100 trillion parameters.