Computer Vision & Graphics Machine Learning & Data Science Popular Research

Heidelberg University Researchers Combine CNNs and Transformers to Synthesize High-Resolution Images

Researchers combine the effectiveness of the inductive bias in CNNs with the expressivity of transformers to model and synthesize high resolution images.

Is there a way to efficiently code inductive image biases into models while retaining all the flexibility of transformers? “Yes,” say researchers from Germany’s Heidelberg University. In a new paper, the team proposes a novel approach that combines the effectiveness of the inductive bias in convolutional neural networks (CNNs) with the expressivity of transformers to model and synthesize high resolution images.

Synced recently showcased ten NeurIPS 2020 papers illustrating trends in transformers — ranging from extended use of the neural architecture to innovative advancements in technique, architectural design changes and more — in NeurIPS 2020 | Teaching Transformers New Tricks. Transformer architectures excel in tasks with sequential data; they have revolutionized natural language processing and have recently also been applied to reinforcement learning, computer vision and symbolic mathematics.

A transformer trade-off however is that they contain no inductive bias to prioritize local interactions, which makes them computationally infeasible for long sequences such as high-resolution images. The Heidelberg University team’s proposed method addresses this by first using CNNs to learn a context-rich vocabulary of image constituents, then utilizing transformers to efficiently model their composition within the images. The team explains that rather than representing images with pixels, the proposed approach represents them as a composition of perceptually rich image constituents from a codebook of context-rich visual parts.

This significantly reduces the description length of compositions, enabling researchers to efficiently model the global interrelations within images using a transformer architecture. The generated images are realistic and consistent high-resolution both in an unconditional and a conditional setting.

Screen Shot 2020-12-20 at 10.05.22 PM.png
Screen Shot 2020-12-20 at 10.03.00 PM.png

In evaluations, the proposed method outperformed SOTA codebook-based approaches based on convolutional architectures. The team says the CNN and transformer combination “taps into the full potential of their complementary strengths” and represents the first high-resolution image synthesis results with a transformer-based architecture.

The paper Taming Transformers for High-Resolution Image Synthesis is on arXiv. This project is also on GitHub.


Analyst: Yuqing Li | Editor: Michael Sarazen; Fangyu Cai


B4.png

Synced Report | A Survey of China’s Artificial Intelligence Solutions in Response to the COVID-19 Pandemic — 87 Case Studies from 700+ AI Vendors

This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon KindleAlong with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.

Click here to find more reports from us.


AI Weekly.png

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

0 comments on “Heidelberg University Researchers Combine CNNs and Transformers to Synthesize High-Resolution Images

Leave a Reply

Your email address will not be published.

%d bloggers like this: