Humans are very good at recognizing objects as well as the materials that they are made of. We can easily tell cheese from butter, silk from linen and snow from ice just by looking. Understanding material perception is important for many real-world applications. For instance, a robot cooking in the kitchen will benefit from the knowledge of material perception when deciding if food is cooked or raw. In this talk, I will present studies that are motivated by two important applications of material perception: online shopping and computer graphics (CG) rendering. First, I will discuss the image cues that allow humans to infer tactile and mechanical information about deformable materials. I will present an experiment in which subjects were asked to match their tactile and visual perception of fabrics. I will show that image cues such as 3D folds and color are important for predicting subjects' tactile perception. Not only do these findings have immediate practical implications (e.g., improving online shopping interfaces for fabrics), but they also have theoretical implications: image-based visual cues affect tactile perception. Second, I will present a project on the visual perception of translucent materials (e.g., wax, milk, and jade) using computer-rendered stimuli. Humans are very sensitive to subtle differences in translucency (e.g., baby skin vs. adult skin), however, it is difficult to render translucent materials realistically. I will show how we measured the perceptual dimensions of physical scattering parameter space and used those measurements to produce more realistic renderings of materials like marble and jade. Taken together, my findings highlight the importance of material perception in the real world, and demonstrate how human perception can contribute to applications in computer vision and graphics.
Sensors acquire an increasing amount of diverse information posing two challenges. Firstly, how can we efficiently deal with such a big amount of data and secondly, how can we benefit from this diversity? In this talk I will first present an approach to deal with large graphical models. The presented method distributes and parallelizes the computation and memory requirements while preserving convergence and optimality guarantees of existing inference and learning algorithms. I will demonstrate the effectiveness of the approach on stereo reconstruction from high-resolution imagery. In the second part I will present a unified framework for structured prediction with latent variables which includes hidden conditional random fields and latent structured support vector machines as special cases. This framework allows to linearly combine different sources of information and I will demonstrate its efficacy on the problem of estimating the 3D room layout given a single image. For the latter problem I will in a third part introduce a globally optimal yet efficient inference algorithm based on branch-and-bound.
Consumer level depth cameras such as Kinect have changed the landscape of 3D computer vision. In this talk we will discuss two approaches that both learn to directly infer correspondences between observed depth image pixels and 3D model points. These correspondences can then be used to drive an optimization of a generative model to explain the data. The first approach, the "Vitruvian Manifold", aims to fit an articulated 3D human model to a depth camera image, and extends our original Body Part Recognition algorithm used in Kinect. It applies a per-pixel regression forest to infer direct correspondences between image pixels and points on a human mesh model. This allows an efficient “one-shot” continuous optimization of the model parameters to recover the human pose. The second approach, "Scene Coordinate Regression", addresses the problem of camera pose relocalization. It uses a similar regression forest, but now aims to predict correspondences between observed image pixels and 3D world coordinates in an arbitrary 3D scene. These correspondences are again used to drive an efficient optimization of the camera pose to a highly accurate result from a single input frame.
Developing autonomous systems that are able to assist humans in everyday's tasks is one of the grand challenges in modern computer science. Notable examples are personal robotics for the elderly and people with disabilities, as well as autonomous driving systems which can help decrease fatalities caused by traffic accidents. In order to perform tasks such as navigation, recognition and manipulation of objects, these systems should be able to efficiently extract 3D knowledge of their environment. In this talk, I'll show how Markov random fields provide a great mathematical formalism to extract this knowledge. In particular, I'll focus on a few examples, i.e., 3D reconstruction, 3D layout estimation, 2D holistic parsing and object detection, and show representations and inference strategies that allow us to achieve state-of-the-art performance as well as several orders of magnitude speed-ups.
Motion capture and data driven technologies have come very far over the past few years. In terms of human capture the high volume of research that has gone into this sub group has led to very impressive results. Human motion can now be captured in real time which when used in the creative sectors can lead to blockbuster films such as Avatar. Similarly in the medical sectors these techniques can be used to diagnose, analyse performance and avoid invasive procedures in tasks such as deformity correction. There is, however, very little research on motion capture of animals. While the technology for capturing animal motion exists, the method used is inefficient, unreliable and limited, as much manual work is required to turn blocked out motions into acceptable results. How we move forward with a suitable procedure however is the major question. Do we extend the life of marker based capture or do we move towards the holy grail of markerless tracking? In this talk we look at a possible solution suitable for both possibilities through physically based simulation techniques. It is our belief that such techniques could help cross the gap in the uncanny valley as far as marker based capture is concerned but also be useful as far as markerless tracking is concerned.
Non-blind deblurring is an integral component of blind approaches for removing image blur due to camera shake. Even though learning-based deblurring methods exist, they have been limited to the generative case and are computationally expensive. To this date, manually-defined models are thus most widely used, though limiting the attained restoration quality. We address this gap by proposing a discriminative approach for non-blind deblurring. One key challenge is that the blur kernel in use at test time is not known in advance. To address this, we analyze existing approaches that use half-quadratic regularization. From this analysis, we derive a discriminative model cascade for image deblurring. Our cascade model consists of a Gaussian CRF at each stage, based on the recently introduced regression tree fields. We train our model by loss minimization and use synthetically generated blur kernels to generate training data. Our experiments show that the proposed approach is efficient and yields state-of-the-art restoration quality on images corrupted with synthetic and real blur.
Irregular triangle meshes are a powerful digital shape representation: they are flexible and can represent virtually any complex shape; they are efficiently rendered by graphics hardware; they are the standard output of 3D acquisition and routinely used as input to simulation software. Yet irregular meshes are difficult to model and edit because they lack a higher-level control mechanism. In this talk, I will survey a series of research results on surface modeling with meshes and show how high-quality shapes can be manipulated in a fast and intuitive manner. I will outline the current challenges in intelligent and more user-friendly modeling metaphors and will attempt to suggest possible directions for future work in this area.
3D reconstruction from images has been a tremendous success-story of computer vision, with city-scale reconstruction now a reality. However, these successes apply almost exclusively in a static world, where the only motion is that of the camera. Even with the advent of realtime depth cameras, full 3D modelling of dynamic scenes lags behind the rigid-scene case, and for many objects of interest (e.g. animals moving in natural environments), depth sensing remains challenging. In this talk, I will discuss a range of recent work in the modelling of nonrigid real-world 3D shape from 2D images, for example building generic animal models from internet photo collections. While the state of the art depends heavily on dense point tracks from textured surfaces, it is rare to find suitably textured surfaces: most animals are limited in texture (think of dogs, cats, cows, horses, …). I will show how this assumption can be relaxed by incorporating the strong constraints given by the object’s silhouette.
Significant progress has been made over the last years in estimating people's shape and motion from video and nonetheless the problem still remains unsolved. This is especially true in uncontrolled environments such as people in the streets or the office where background clutter and occlusions make the problem even more challenging.
The goal of our research is to develop computational methods that enable human pose estimation from video and inertial sensors in indoor and outdoor environments. Specifically, I will focus on one of our past projects in which we introduce a hybrid Human Motion Capture system that combines video input with sparse inertial sensor input. Employing a particle-based optimization scheme, our idea is to use orientation cues derived from the inertial input to sample particles from the manifold of valid poses. Additionally, we introduce a novel sensor noise model to account for uncertainties based on the von Mises-Fisher distribution. Doing so, orientation constraints are naturally fulfilled and the number of needed particles can be kept very small. More generally, our method can be used to sample poses that fulfill arbitrary orientation or positional kinematic constraints. In the experiments, we show that our system can track even highly dynamic motions in an outdoor environment with changing illumination, background clutter, and shadows.