This is an updated version
The 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) has announced its best paper awards. One of the world’s top academic conferences in the field of computer vision, CVPR kicked off today as a virtual gathering. This year saw a total of 1,467 papers accepted from a record-high 5,865 valid submissions. The 25 percent acceptance rate is on par with CVPR 2019.
In his Open Remarks, Program Chair Ce Liu said “We expected a big increase in the number of our attendees, but the coronavirus changed everything. Even though CVPR is a completely virtual conference, we have about 7,000 attendees, surpassing the number of 2018.”
Best Paper Award:
Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild
Authors: Shangzhe Wu, Christian Rupprecht, Andrea Vedaldi
Institution(s): University of Oxford
Abstract: We propose a method to learn 3D deformable object categories from raw single-view images, without external supervision. The method is based on an autoencoder that factors each input image into depth, albedo, viewpoint and illumination. In order to disentangle these components without supervision, we use the fact that many object categories have, at least in principle, a symmetric structure. We show that reasoning about illumination allows us to exploit the underlying object symmetry even if the appearance is not symmetric due to shading. Furthermore, we model objects that are probably, but not certainly, symmetric by predicting a symmetry probability map, learned end-to-end with the other components of the model. Our experiments show that this method can recover very accurately the 3D shape of human faces, cat faces and cars from single-view images, without any supervision or a prior shape model. On benchmarks, we demonstrate superior accuracy compared to another method that uses supervision at the level of 2D image correspondences.
Best Student Paper Award:
BSP-Net: Generating Compact Meshes via Binary Space Partitioning
Authors: Zhiqin Chen, Andrea Tagliasacchi, Hao Zhang
Institution(s): Simon Fraser University, Google Research
Abstract: Polygonal meshes are ubiquitous in the digital 3D domain, yet they have only played a minor role in the deep learning revolution. Leading methods for learning generative models of shapes rely on implicit functions, and generate meshes only after expensive iso-surfacing routines. To overcome these challenges, we are inspired by a classical spatial data structure from computer graphics, Binary Space Partitioning (BSP), to facilitate 3D learning. The core ingredient of BSP is an operation for recursive subdivision of space to obtain convex sets. By exploiting this property, we devise BSP-Net, a network that learns to represent a 3D shape via convex decomposition. Importantly, BSP-Net is unsupervised since no convex shape decompositions are needed for training. The network is trained to reconstruct a shape using a set of convexes obtained from a BSP-tree built on a set of planes. The convexes inferred by BSP-Net can be easily extracted to form a polygon mesh, without any need for iso-surfacing. The generated meshes are compact (i.e., low-poly) and well suited to represent sharp geometry; they are guaranteed to be watertight and can be easily parameterized. We also show that the reconstruction quality by BSP-Net is competitive with state-of-the-art methods while using much fewer primitives. Code is available at this https URL.
CVPR Best Student Paper Honorable Mention:
DeepCap: Monocular Human Performance Capture Using Weak Supervision
Authors: Marc Habermann, Weipeng Xu, Michael Zollhoefer, Gerard Pons-Moll, Christian Theobalt
Institution(s): Max Planck Institute for Informatics, Saarland Informatics Campus, Stanford University
Abstract: Human performance capture is a highly important computer vision problem with many applications in movie production and virtual/augmented reality. Many previous performance capture approaches either required expensive multi-view setups or did not recover dense space-time coherent geometry with frame-to-frame correspondences. We propose a novel deep learning approach for monocular dense human performance capture. Our method is trained in a weakly supervised manner based on multi-view supervision completely removing the need for training data with 3D ground truth annotations. The network architecture is based on two separate networks that disentangle the task into a pose estimation and a non-rigid surface deformation step. Extensive qualitative and quantitative evaluations show that our approach outperforms the state of the art in terms of quality and robustness.
PAMI Longuet-Higgins Prize (Retrospective Highest Impact Paper from CVPR 2010):
Secrets of Optical Flow Estimation and Their Principles
Authors: Deqing Sun, Stefan Roth, Michael J. Black
Institution(s): Brown University, TU Darmstadt
Abstract: The accuracy of optical flow estimation algorithms has been improving steadily as evidenced by results on the Middlebury optical flow benchmark. The typical formulation, however, has changed little since the work of Horn and Schunck. We attempt to uncover what has made recent advances possible through a thorough analysis of how the objective function, the optimization method, and modern implementation practices influence accuracy. We discover that “classical” flow formulations perform surprisingly well when combined with modern optimization and implementation techniques. Moreover, we find that while median filtering of intermediate flow fields during optimization is a key to recent performance gains, it leads to higher energy solutions. To understand the principles behind this phenomenon, we derive a new objective that formalizes the median filtering heuristic. This objective includes a nonlocal term that robustly integrates flow estimates over large spatial neighborhoods. By modifying this new term to include information about flow and image boundaries we develop a method that ranks at the top of the Middlebury benchmark.
Additional 2007 PAMI Longuet-Higgins Prize (Retrospective Highest Impact Paper from CVPR 2007):
Accurate, Dense, and Robust Multi-View Stereopsis
Authors: Yasutaka Furukawa, Jean Ponce
Institution(s): University of Illinois at Urbana-Champaign, Ecole Normale Supérieure
Abstract: This paper proposes a novel algorithm for calibrated multi-view stereopsis that outputs a (quasi) dense set of rectangular patches covering the surfaces visible in the input images. This algorithm does not require any initialization in the form of a bounding volume, and it detects and discards automatically outliers and obstacles. It does not perform any smoothing across nearby features, yet is currently the top performer in terms of both coverage and accuracy for four of the six benchmark datasets presented in . The keys to its performance are effective techniques for enforcing local photometric consistency and global visibility constraints. Stereopsis is implemented as a match, expand, and filter procedure, starting from a sparse set of matched keypoints, and repeatedly expanding these to nearby pixel correspondences before using visibility constraints to filter away false matches. A simple but effective method for turning the resulting patch model into a mesh appropriate for image-based modeling is also presented. The proposed approach is demonstrated on various datasets including objects with fine surface details, deep concavities, and thin structures, outdoor scenes observed from a restricted set of viewpoints, and “crowded” scenes where moving obstacles appear in different places in multiple images of a static structure of interest.
CVPR 2020 comprises a main conference along with several workshops and short courses and runs virtually through June 19. The official CVPR 2020 Live Stream is here:
Journalist: Fangyu Cai | Editor: Michael Sarazen