Computer Vision & Graphics Research Shenzhen Institute of Advanced Technology

Interactive Multi-Dimension Modulation with Dynamic Controllable Residual Learning for Image Restoration

To adapt conventional deep models to real scenarios, a research work carried by a team (XPixel) from the Shenzhen Institutes of Advanced Technology (SIAT) of the Chinese Academy of Sciences investigated the use of additional branches to tune imagery effects.

Deep learning methods have been widely used in image restoration problems, and most of them focus on a specific restoration task, e.g., denoising or deblurring. For a given input, these methods generate a fixed output with a pre-determined restoration level.

Therefore, they lack the flexibility to alter the output effects according to different users’ flavors. This flexibility is essential in many image processing applications, where users desire to adjust the restoration strength continuously by a sliding bar.

To adapt conventional deep models to real scenarios, a research work carried by a team (XPixel) from the Shenzhen Institutes of Advanced Technology (SIAT) of the Chinese Academy of Sciences investigated the use of additional branches to tune imagery effects. The outputs of their networks could be interactively controlled by a single variable at test-time, without retraining on new datasets. They could generate continuous restoration results between the pre-defined start level and end level (e.g., JPEG quality q40→q10).

The study named “Modulating Image Restoration with Continual Levels via Adaptive Feature Modification Layers” was published in Conference on Computer Vision and Pattern Recognition (CVPR) 2019.

Available at: https://arxiv.org/abs/1904.08118

This pioneer modulation work assumed that the input image has only a single degradation type, e.g., noise or blur, thus the modulation lies in one dimension. However, the real-world scenarios are more complicated than the above assumptions. Specifically, real images usually contain multiple types of degradations, e.g., noise and blur. Then the users will need separate buttons to control each of them.

Recently, this team presented a new problem setup, called multi-dimension (MD) modulation, which aimed at modulating output effects across multiple degradation types and levels. They proposed the first MD modulation framework with dynamic Controllable Residual learning, called CResMD. Specifically, a controllable variable was added on the conventional residual connection to allow a weighted summation of input and residual. The values of these weights were generated by another condition network. With corrupted image and degradation information as inputs, the network could output the corresponding restored image. By tweaking the condition vector, users could control the output effects in MD space at test time. Experimental results showed that the proposed CResMD could realize MD modulation with high accuracy, and achieved superior performance to existing approaches on single-dimension (SD) modulation tasks with much less (0.16%)additional parameters.

This study named “Interactive Multi-Dimension Modulation with Dynamic Controllable Residual Learningfor Image Restoration” was published in European Conference on Computer Vision (ECCV) 2020. 

Available at: https://arxiv.org/abs/1912.05293

Fig. 1 Framework of CResMD, consisting of two branches: base network and condition network. The base network deals with image restoration, while the condition network generates the weights α for the controllable residual connections. The condition network contains several fully-connected layers and accepts the normalized restoration information as input. The building block (green) can be replaced by any existing block like residual attention block or dense block.
Fig. 2 Different levels of restoration effects by setting different weights α on global residual. When α=1, the network outputs the restored image. To achieve identity mapping, we set α=0 to disable the residual branch.

To evaluate the modulation performance on one degradation combination (e.g. blur r2+noise σ30), a baseline model using the architecture of the base network was trained on this degradation, which could be regarded as an upper bound. The experimental results indicated a high modulation accuracy when there were two types of degradations. While for one degradation, the performance slightly decreased but was still within an acceptable range. Compared with previous single-dimension (SD) modulation works, CResMD also yielded superior results on single degradation modulation problem.

Table. 2D experiments evaluated on CBSD68. The PSNR distances within 0.2 dB are shown in bold. Lower is better.
Fig. 3. Quantitative comparison with SD methods on CBSD68 data set in PSNR.
Fig. 4. Qualitative results of MD modulation. In each row, we only change one factor with other factors fixed. We arrive at the best choice in the yellow box. Better view in zoom and color.

The team also identified some research limitations. Although CResMD could realize modulation across multiple domains, the performance could be further improved. The controlling method could be more accurate and diverse.

The current method is also faced with problem of “dimension curse”. When the number of dimension increases, the accuracy will decrease. For the next step, it is desirable to find better solutions for higher-dimension modulation. All in all, the modulation problem is a new and interesting direction. The proposed techniques, like AdaFM and controllable residual connection, are also potentially useful for other computer vision tasks.  


About Dr. Chao Dong

Chao Dong is currently an associate professor in SIAT, CAS. He received his Ph.D. degree from The Chinese University of Hong Kong in 2016. In 2014, he first introduced deep learning method –Super-Resolution Convolutional Neural Network (SRCNN) into the super-resolution field. His team has won the first places in several international super-resolution challenges –NTIRE2018, PIRM2018, NTIRE2019 and AIM2020. He worked in SenseTime from 2016 to 2018, as the team leader of Image Super-Resolution Group. He, with his team, developed the first deep learning based “digital zoom” for smart phone cameras. His current research interest focuses on low-level vision problems, such as image/video super-resolution, denoising and enhancement.

Views expressed in this article do not represent the opinion of Synced Review or its editors.


B4.png

Synced Report | A Survey of China’s Artificial Intelligence Solutions in Response to the COVID-19 Pandemic — 87 Case Studies from 700+ AI Vendors
This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.
Click here to find more reports from us.


AI Weekly.png

We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weeklyto get weekly AI updates.


Thinking of contributing to Synced Review? Synced’s new column Share My Research welcomes scholars to share their own research breakthroughs with global AI enthusiasts.

SMR.png

2 comments on “Interactive Multi-Dimension Modulation with Dynamic Controllable Residual Learning for Image Restoration

  1. Pingback: Interactive Multi-Dimension Modulation with Dynamic Controllable Residual Learning for Image Restoration – IAM Network

  2. Pingback: Interactive Multi-Dimension Modulation with Dynamic Controllable Residual Learning for Image … – Paper TL

Leave a Reply

Your email address will not be published.

%d bloggers like this: