We present a generative neural network which enables us to generate plausible 3D indoor scenes in large quantities and varieties, easily and highly efficiently. Our key observation is that indoor scene structures are inherently hierarchical. Hence, our network is not convolutional; it is a recursive neural network or RvNN. Using a dataset of annotated scene hierarchies, we train a variational recursive autoencoder, or RvNN-VAE, which performs scene object grouping during its encoding phase and scene generation during decoding. Specifically, a set of encoders are recursively applied to group 3D objects based on support, surround, and co-occurrence relations in a scene, encoding information about objects? spatial properties, semantics, and their relative positioning with respect to other objects in the hierarchy. By training a variational autoencoder (VAE), the resulting fixed-length codes roughly follow a Gaussian distribution. A novel 3D scene can be generated hierarchically by the decoder from a randomly sampled code from the learned distribution. We coin our method GRAINS, for Generative Recursive Autoencoders for INdoor Scenes. We demonstrate the capability of GRAINS to generate plausible and diverse 3D indoor scenes and compare with existing methods for 3D scene synthesis. We show applications of GRAINS including scene modeling from 2D layouts, scene editing, and data enhancement for semantic scene segmentation.
Path tracing produces realistic results including global illumination using a unified simple rendering pipeline. Reducing the amount of noise to imperceptible levels without post-processing requires thousands of samples per pixel (spp), while currently it is only possible to render extremely noisy 1 spp frames in real time with desktop GPUs. However, post-processing can utilize feature buffers, which contain noise-free auxiliary data available in the rendering pipeline. Previously, regression-based noise filtering methods have only been used in offline rendering due to their high computational cost. In this paper we propose a novel regression-based reconstruction pipeline, called Blockwise Multi-Order Feature Regression (BMFR), tailored for path-traced 1 spp inputs that runs in real time. The high speed is achieved with a fast implementation of augmented QR factorization and by using stochastic regularization to address rank-deficient feature data. The proposed algorithm is 1.8× faster than the previous state-of-the-art real-time path tracing reconstruction method while producing better quality frame sequences.
In this paper we present a novel dictionary learning framework designed for compression and sampling of light fields and light field videos. Unlike previous methods, where a single dictionary with one dimensional atoms is learned, we propose to train a multi dimensional dictionary ensemble (MDE). We show that learning an ensemble in the native dimensionality of the data promotes sparsity, hence increasing the compression ratio and sampling efficiency. To make maximum use of correlations within the light field data sets, we also introduce a novel non-local pre-clustering approach that constructs an aggregate MDE (AMDE). The pre-clustering not only improves the image quality, but also reduces the training time by an order of magnitude in most cases. The decoding algorithm supports efficient local reconstruction of the compressed data, which enables efficient real-time playback of high resolution light field videos. Moreover, we discuss the application of AMDE for compressed sensing. A theoretical analysis is presented which indicates the required conditions for exact recovery of point-sampled light fields that are sparse under AMDE. The analysis provides guidelines for designing efficient compressive light field cameras.
We propose a new algorithm for color transfer between images that have perceptually similar semantic structures. We aim to achieve a more accurate color transfer that leverages semantically-meaningful dense correspondence between images. To accomplish this, our algorithm uses neural representations for matching. Additionally, the color transfer should be spatially-variant and globally coherent. Therefore, our algorithm optimizes a local linear model for color transfer satisfying both local and global constraints. Our proposed approach jointly optimizes matching and color transfer, adopting a coarse-to-fine strategy. The proposed method can be successfully extended from "one-to-one" to "one-to-many" color transfers. The latter further addresses the problem of mismatching elements of the input image. We validate our proposed method by testing it on a large variety of image content.
In this paper we present a novel representation for deformation fields of 3D shapes, by considering the induced changes in the underlying metric. In particular, our approach allows to represent a deformation field in a coordinate-free way as a linear operator acting on real-valued functions defined on the shape. Such a representation both provides a way to relate deformation fields to other classical functional operators and enables analysis and processing of deformation fields using standard linear-algebraic tools. This opens the door to a wide variety of applications such as deformation design through precise control of metric distortion, joint deformation analysis and coordinate-free deformation transfer without requiring pointwise correspondences. Our method is applicable to both surface and volumetric shape representations and we guarantee the equivalence between the operator-based and standard deformation field representation under mild genericity conditions in the discrete setting. We demonstrate the utility of our approach by comparing it with existing techniques for deformation transfer, and show significant improvement in the presence of approximate, soft maps. We also show how our representation provides a toolbox for problems which are challenging using existing techniques, such as intrinsic symmetrization and injecting extrinsic information into the computation of functional maps.
Many strategies exist for optimizing non-linear distortion energies in geometry and physics applications, but devising an approach that achieves the convergence promised by Newton-type methods remains challenging. In order to guarantee the positive semi-definiteness required by these methods, a numerical eigendecomposition or approximate regularization is usually needed. In this paper, we present analytic expressions for the eigensystems at each quadrature point of a wide range of isotropic distortion energies. These systems can then be used to project energy Hessians to positive semi-definiteness analytically. Unlike previous attempts, our formulation provides compact expressions that are valid both in 2D and 3D, and does not introduce spurious degeneracies. At its core, our approach utilizes the invariants of the stretch tensor that arises from the polar decomposition of the deformation gradient. We provide closed-form expressions for the eigensystems for all these invariants, and use them to systematically derive the eigensystems of any isotropic energy. Our results are suitable for geometry optimization over flat surfaces or volumes, and agnostic to both the choice of discretization and basis function. To demonstrate the efficiency of our approach, we include comparisons against existing methods on common graphics tasks such as surface parameterization and volume deformation.
We present an automatic facial rigging system for generating person specific 3D facial blendshapes from images in the wild (e.g., Internet images of Hillary Clinton), where the face shape, pose, expressions, and illumination are all unknown. Our system initializes the 3D blendshapes with sparse facial features detected from the input images using a mutli-linear model and then refines the blendshapes via per-pixel shading cues with a new blendshape retargeting algorithm. Finally, we introduce a new algorithm for recovering detailed facial features from the input images. To handle large variations of face poses and illuminations in the input images, we also develop a set of failure detection schemes that can robustly filter out inaccurate results in each step. Our method greatly simplifies the 3D facial rigging process and generates a more faithful face shape and expression of the subject than multi-linear model fitting. We validate the robustness and accuracy of our system using images of a dozen subjects that exhibit significant variations of face shapes, poses, expressions, and illuminations.
This paper describes a method for efficiently computing parallel transport of tangent vectors on curved surfaces, or more generally, any vector-valued data on a curved manifold. More precisely, it extends a vector field defined over any region to the rest of the domain via parallel transport along shortest geodesics. This basic operation enables fast, robust algorithms for extrapolating level set velocities, inverting the exponential map, computing geometric medians and Karcher/Fréchet means of arbitrary distributions, constructing centroidal Voronoi diagrams, and finding consistently ordered landmarks. Rather than evaluate parallel transport by explicitly tracing geodesics, we show that it can be computed via a short-time heat flow involving the connection Laplacian. As a result, transport can be achieved by solving three prefactored linear systems, each akin to a standard Poisson problem. Moreover, to implement the method we need only a discrete connection Laplacian, which we describe for a variety of geometric data structures (point clouds, polygon meshes, etc.). We also study the numerical behavior of our method, showing empirically that it converges under refinement, and augment the construction of intrinsic Delaunay triangulations (iDT) so that they can be used in the context of tangent vector field processing.
Imaging objects obscured by occluders is a significant challenge for many applications. A camera that could ``see around corners'' could help improve navigation and mapping capabilities of autonomous vehicles or make search and rescue missions more effective. Time-resolved single-photon imaging systems have recently been demonstrated to record optical information of a scene that can lead to an estimation of the shape and reflectance of objects hidden from the line of sight of a camera. However, existing non-line-of-sight (NLOS) reconstruction algorithms have been constrained in the types of light transport effects they model for the hidden scene parts. We introduce a factored NLOS light transport representation that accounts for partial occlusions and surface normals. Based on this model, we develop a factorization approach for inverse time-resolved light transport and demonstrate high-fidelity NLOS reconstructions for challenging scenes both in simulation and with an experimental NLOS imaging system.