A fundamental problem in computer vision is the reconstruction of the shape and motion of the 3D world. This has applications as varied self-driving cars, 3D mapping, virtual reality, graphics, and robotics. We think that reasoning about the 3D world and its structure is at the heart of computer vision.
To help push the field in new directions, we have co-organized two workshops on Scenes from Video that bring together researchers working on video, flow, and structure from motion with researchers working on semantic scene analysis. The idea is that integration of these fields (metric and semantic) will lead to improvements in both.
Perceiving Systems is at the forefront of research on optical flow; it is one of our core competencies and our algorithms are regularly at the top of the optical flow benchmarks.
By optical flow we mean the projection of the 3D motion field onto the image plane of the camera. We focus on this (as opposed to apparent motion) because this flow is related to the structure of the 3D scene, the boundaries of objects, and the motion of the camera. Flow is an important mediating representation (an intrinsic image) that helps the analysis of scenes.
Optical flow has proven useful for problems throughout computer vision, graphics, medical imagining, robotics, and many application domains. and while there are many reasons to compute flow, the ones that interest us most are to
- establish correspondence across time — this enables reasoning across time, establishes object permanence etc.;
- to determine scene structure — what is rigid, what isn’t, where the boundaries are, etc.
Open problems in the field include: dealing with fast motion of small objects, modeling motion with complex material properties, reflections and transparency, dealing with motion blur, accurately estimating flow at surface boundaries, segmenting scenes into regions, and improving accuracy and speed simultaneously.
Our current work is focused on combining the estimation of flow with higher level scene analysis, including combing flow with the estimation of 3D objects and their motion and estimating 3D scene flow. We also are using optical flow in many applications, including human shape and motion analysis.
Beyond motion, we study the recovery and reconstruction of 3D structure from single images, RGB-D data, video sequences, stereo, and multi-view stereo.
Our major innovations lie in combining high-level and semantic cues with low-level features. We view the problem as the integration of model fitting with dense structure recovery. While much of our work has focused on object-specific models like people and cars, we are particularly interested in generic representations and compositional models of objects and scenes.
Increases in computing power, labeled training data, large databases of 3D CAD models, 3D sensors, and open-source rendering engines, are all opening new opportunities to model and infer 3D objects and scenes.