A research team from Microsoft, Zhejiang University, Johns Hopkins University, Georgia Institute of Technology and University of Denver proposes Only-Train-Once (OTO), a one-shot DNN training and pruning framework that produces a slim architecture from a full heavy model without fine-tuning while maintaining high performance.
A Google Research team proposes Wordcraft, a text editor with a built-in AI-powered creative writing assistant. Wordcraft uses few-shot learning and the natural affordances of conversation to support a variety of user interactions; and can help with story planning, writing and editing.
A research team from Taichi Graphics, MIT CSAIL, Zhejiang University, Tsinghua University and Kuaishou Technology introduces a programming language and compiler for quantized simulation that achieves both high performance and significantly reduced memory costs by enabling flexible and aggressive quantization.
A research team from Baidu proposes ERNIE 3.0, a unified framework for pretraining large-scale, knowledge-enhanced models that can easily be tailored for both natural language understanding and generation tasks with zero-shot learning, few-shot learning or fine-tuning, and achieves state-of-the-art results on NLP tasks.
A research team from the University of Electronic Science and Technology of China, Chinese Academy of Sciences, School of Education Shaanxi Normal University, Japan Advanced Institute of Science and Technology and ETH Zurich encodes the basic belief assignment (BBA) into quantum states and implements them on a quantum circuit, aiming to utilize quantum computation characteristics to better handle belief functions.
University of Washington and the Allen Institute for Artificial Intelligence researchers say human evaluations are no longer the gold standard for evaluating natural language generation models, as evaluators’ focus on surface-level text qualities degrades their ability to accurately assess current NLG models’ overall capabilities.
As the dynamic computational graph is widely supported by many machine learning frameworks, GPU memory utilization for training on a dynamic computational graph becomes a key specification of these frameworks. In the recently released v1.4, MegEngine provides a way to reduce the GPU memory usage by additional computation using Dynamic Tensor Rematerialization (DTR) technique and further engineering optimization, which makes large batch size training on a single GPU possible.
At the World Artificial Intelligence Conference (WAIC) held in Shanghai on July 9, Daosheng Tang, the senior vice exectuvie of Tencent and president of the Tencent cloud and smart industry group, said that the company’s Yangtze River AI Supercomputing Center with RMB 45 billion (approx. USD 7 billion) investment will soon commence operation.
A research team from ByteDance AI Lab, University of Wisconsin–Madison and Nanjing University wins the ACL 2021 best paper award. Their proposed Vocabulary Learning via Optimal Transport (VOLT) approach leverages optimal transport to automatically find an optimal vocabulary without trial training.
A research team from Facebook AI and UC Berkeley finds a solution for vision transformers’ optimization instability problem by simply using a standard, lightweight convolutional stem for ViT models. The approach dramatically increases optimizer stability and improves peak performance without sacrificing computation efficiency.
A research team from Microsoft Research Asia, University of Science and Technology of China, Huazhong University of Science and Technology, and Tsinghua University takes advantage of the inherent spatiotemporal locality of videos to present a pure-transformer backbone architecture for video recognition that leads to a better speed-accuracy trade-off.
A research team from University of Cambridge, Imperial College London & Twitter, UCLA, MPI-MIS, and SJTU & UNSW proposes CW Networks (CWNs), a message-passing scheme that operates on regular cell complexes and achieves stronger expressive power than graph neural networks (GNNs).
A team from Google Research proposes prediction depth, a new measure of example difficulty determined from hidden embeddings. Their study reveals the surprising fact that the prediction depth of a given input has strong connections to a model’s uncertainty, confidence, accuracy and speed of learning for that data point.
Researchers from Google conduct a survey on how to make Deep Learning models smaller, faster, and better. The team focuses on core areas of model efficiency, from modelling techniques to hardware support, and open-sources an experiment-based guide and code to help practitioners optimize their model training and deployment.
A research team from Facebook AI Research and Mila – McGill University explores deep learning model accuracy versus time trade-offs in anytime learning, which they term Anytime Learning at Macroscale (ALMA). The team evaluates various models to gain insights on how to strike different trade-offs between accuracy and time to obtain a good learner.
A research team from Princeton University, the Institute of Applied Physics and Computational Mathematics and the Beijing Institute of Big Data Research uses the Deep Potential (DP) method to predict the phase diagram of water from ab initio quantum theory, from low temperature and pressure to about 2400 K and 50 GPa. The paper was published in leading physics journal Physical Review Letters and represents an important milestone in the application of DP.
On June 22, LG Electronics announced the launch of a new “digital x-ray detector” (DXD). The new product is equipped with assisted AI diagnostic functions, which are designed by healthcare AI solutions company VUNO. The product will detect chest X-ray images for abnormal findings and enhance lesion areas with coloring and outline, to help medical professionals accurately identify lung diseases including tuberculosis, pneumonia, and cancer.
On June 16, General Motors Co. announced it will increase its electric vehicles (EV) and autonomous vehicles (AV) investments from 2020 through 2025 to USD 35 billion, representing a 75 percent increase from its initial commitment announced prior to the pandemic
A research team from New York University and Google Research explores whether knowledge distillation really works, showing that a surprisingly large discrepancy often remains between the predictive distributions of the teacher and student models, even when the student has the capacity to perfectly match the teacher.
A research team from Mila, McGill University, Université de Montréal, DeepMind and Microsoft proposes GFlowNet, a novel flow network-based generative method that can turn a given positive reward into a generative policy that samples with a probability proportional to the return.v
A Google Research team proposes MergeDistill, a framework for merging pretrained teacher LMs from multiple monolingual/multilingual LMs into a single multilingual task-agnostic student LM to leverage the capabilities of the powerful language-specific LMs while still being multilingual and enabling positive language transfer.
A research team from McGill University, Université de Montréal, DeepMind and Mila presents an end-to-end, model-based deep reinforcement learning (RL) agent that dynamically attends to relevant parts of its environments to facilitate out-of-distribution (OOD) and systematic generalization.