Technology

Weekly Papers | Praising PyTorch; Improving Lip Reading; Generating Structured Text and More

Every Friday Synced selects seven recent studies that present topical, innovative or otherwise interesting or important research we believe may be of interest to our readers.

Thousands of machine learning papers are published every month, making it challenging to stay abreast of what’s happening across the research community. Every Friday Synced selects seven recent studies that present topical, innovative or otherwise interesting or important research we believe may be of interest to our readers.

Highlights this week:

  • Researchers detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture.
  • Researchers from Saudi Arabia and the UK propose a simple but effective building block for GAN called semantic region-adaptive normalization (SEAN).
  • Zhejiang University lead research to propose Lip by Speech (LIBS), a new method to strengthen lip reading by learning from speech recognizers that achieves the new SOTA performance on the CMLR and LRS2 datasets.
  • Harvard and OpenAI show a “double-descent” phenomenon in training, which means as the model size increased, its performance first gets worse and then gets better.

Paper One: PyTorch: An Imperative Style, High-Performance Deep Learning Library (arXiv)

Authors: Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, Soumith Chintala from the University of Warsaw, Facebook AI Research, Google, NVIDIA, Orobix, Oxford University, Xamla, Nabla, Twitter, Facebook and Qure.

Abstract: Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs.
In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.
We demonstrate the efficiency of individual subsystems, as well as the overall speed of PyTorch on several common benchmarks.

dn-12.13.webp
Compared with three popular graph deep learning frameworks (CNTK, MXNet, and TensorFlow), define-by-run framework (Chainer), and production-oriented platform (PaddlePaddle), PyTorch’s performance is within 17% of that of the fastest framework. The specific results are shown in the table.

dn-1213.webp

Paper Two: On The Equivalence Between Node Embeddings And Structural Graph Representations (arXiv)

Authors: Balasubramaniam Srinivasan and Bruno Ribeiro from Purdue University

Abstract: This work provides the first unifying theoretical framework for node (positional) embeddings and structural graph representations, bridging methods like matrix factorization and graph neural networks. Using invariant theory, we show that the relationship between structural representations and node embeddings is analogous to that of a distribution and its samples. We prove that all tasks that can be performed by node embeddings can also be performed by structural representations and vice-versa. We also show that the concept of transductive and inductive learning is unrelated to node embeddings and graph representations, clearing another source of confusion in the literature. Finally, we introduce new practical guidelines to generating and using node embeddings, which fixes significant shortcomings of standard operating procedures used today.

Paper Three: SEAN: Image Synthesis with Semantic Region-Adaptive Normalization (arXiv)

Authors: Peihao Zhu and Rameen Abdal from KAUST (King Abdullah University of Science and Technology); Yipeng Qin and Peter Wonka from Cardiff University

Abstract: We propose semantic region-adaptive normalization (SEAN), a simple but effective building block for Generative Adversarial Networks conditioned on segmentation masks that describe the semantic regions in the desired output image. Using SEAN normalization, we can build a network architecture that can control the style of each semantic region individually, e.g., we can specify one style reference image per region. SEAN is better suited to encode, transfer, and synthesize style than the best previous method in terms of reconstruction quality, variability, and visual quality. We evaluate SEAN on multiple datasets and report better quantitative metrics (e.g. FID, PSNR) than the current state of the art. SEAN also pushes the frontier of interactive image editing. We can interactively edit images by changing segmentation masks or the style for any given region. We can also interpolate styles from two reference images per region.

dn-1213-.webp
SEAN normalization. In the upper part, the style codes in ST undergo a per style convolution and are then broadcast to their corresponding regions according to M to yield a style map. The lower part (light blue layers) creates per pixel normalization values using only the region information similar to SPADE.

dn1213.webp
SEAN generator. (A) On the left, the style encoder takes an input image and outputs a style matrix ST. The generator on the right consists of interleaved SEAN ResBlocks and Upsampling layers. (B) A detailed view of a SEAN ResBlock used in (A).

Paper Four: Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers (arXiv)

Authors: Ya Zhao, Rui Xu, and Mingli Song from Zhejiang University; Xinchao Wang from Stevens Institute of Technology; Peng Hou and Haihong Tang from Alibaba Group

Abstract: Lip reading has witnessed unparalleled development in recent years thanks to deep learning and the availability of largescale datasets. Despite the encouraging results achieved, the performance of lip reading, unfortunately, remains inferior to the one of its counterpart speech recognition, due to the ambiguous nature of its actuations that makes it challenging to extract discriminant features from the lip movement videos. In this paper, we propose a new method, termed as Lip by Speech (LIBS), of which the goal is to strengthen lip reading by learning from speech recognizers. The rationale behind our approach is that the features extracted from speech recognizers may provide complementary and discriminant clues, which are formidable to be obtained from the subtle movements of the lips, and consequently facilitate the training of lip readers. This is achieved, specifically, by distilling multigranularity knowledge from speech recognizers to lip readers. To conduct this cross-modal knowledge distillation, we utilize an efficacious alignment scheme to handle the inconsistent lengths of the audios and videos, as well as an innovative filtering strategy to refine the speech recognizer’s prediction. The proposed method achieves the new state-of-the-art performance on the CMLR and LRS2 datasets, outperforming the baseline by a margin of 7.66% and 2.75% in character error rate, respectively.

d.n.1213.webp
dn.1213.webp

Paper Five: Neural Academic Paper Generation (arXiv)

Authors: Samet Demir and Uras Mutlu from Bogaziçi University; Özgur Özdemir from Istanbul Bilgi University

Abstract: In this work, we tackle the problem of structured text generation, specifically academic paper generation in LaTeX, inspired by the surprisingly good results of basic character-level language models. Our motivation is using more recent and advanced methods of language modeling on a more complex dataset of LaTeX source files to generate realistic academic papers. Our first contribution is preparing a dataset with LaTeX source files on recent open-source computer vision papers. Our second contribution is experimenting with recent methods of language modeling and text generation such as Transformer and Transformer-XL to generate consistent LaTeX code. We report cross-entropy and bits-per-character (BPC) results of the trained models, and we also discuss interesting points on some examples of the generated LaTeX code.

dn:1213.png

Paper Six: Deep Double Descent: Where Bigger Models and More Data Hurt (arXiv)

Authors: Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, and Boaz Barak from Harvard University; Ilya Sutskever from OpenAI

Abstract: We show that a variety of modern deep learning tasks exhibit a “double-descent” phenomenon where, as we increase model size, performance first gets worse and then gets better. Moreover, we show that double descent occurs not just as a function of model size, but also as a function of the number of training epochs. We unify the above phenomena by defining a new complexity measure we call the effective model complexity and conjecture a generalized double descent with respect to this measure. Furthermore, our notion of model complexity allows us to identify certain regimes where increasing (even quadrupling) the number of train samples actually hurts test performance.

dn_1213.webp
Effect of data augmentation
d.n_1213.webp
Epoch-wise double descent for ResNet18 and CNN (width=128)

Paper Seven: A Simple Proof of the Quadratic Formula (arXiv)

Authors: Po-Shen Loh from Carnegie Mellon University

Abstract: This article provides a very simple proof of the quadratic formula. The derivation is computationally light and conceptually natural, and has the potential to demystify the quadratic formula for students worldwide.

FireShot Capture 007 -  - arxiv.org.png
Derivation of traditional quadratic formula with arbitrary x2 coefficient

Journalist: Yuan Yuan | Editor: Michael Sarazen

0 comments on “Weekly Papers | Praising PyTorch; Improving Lip Reading; Generating Structured Text and More

Leave a Reply

Your email address will not be published.

%d bloggers like this: