360 results (BibTeX)

2017


Thumb md teaser reflectance filtering
Reflectance Adaptive Filtering Improves Intrinsic Image Estimation

Nestmeyer, T., Gehler, P.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

pre-print Project Page [BibTex]

2017

pre-print Project Page [BibTex]


Thumb md vpn teaser
Video Propagation Networks

Jampani, V., Gadde, R., Gehler, P.

In IEEE Conference on Computer Vision and Patter Recognition (CVPR), 2017 (inproceedings)

arXiv Preprint [BibTex]

arXiv Preprint [BibTex]


Thumb md web teaser
Detailed, accurate, human shape estimation from clothed 3D scan sequences

Zhang, C., Pujades, S., Black, M. J., Pons-Moll, G.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, Spotlight (inproceedings)

Abstract
We address the problem of estimating human body shape from 3D scans over time. Reliable estimation of 3D body shape is necessary for many applications including virtual try-on, health monitoring, and avatar creation for virtual reality. Scanning bodies in minimal clothing, however, presents a practical barrier to these applications. We address this problem by estimating body shape under clothing from a sequence of 3D scans. Previous methods that have exploited statistical models of body shape produce overly smooth shapes lacking personalized details. In this paper we contribute a new approach to recover not only an approximate shape of the person, but also their detailed shape. Our approach allows the estimated shape to deviate from a parametric model to fit the 3D scans. We demonstrate the method using high quality 4D data as well as sequences of visual hulls extracted from multi-view images. We also make available a new high quality 4D dataset that enables quantitative evaluation. Our method outperforms the previous state of the art, both qualitatively and quantitatively.

arxiv_preprint [BibTex]

arxiv_preprint [BibTex]


Thumb md web teaser
Dynamic FAUST: Registering Human Bodies in Motion

Bogo, F., Romero, J., Pons-Moll, G., Black, M. J.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, Oral (inproceedings)

[BibTex]

coming soon [BibTex]

2016


Thumb md cover
Dynamic baseline stereo vision-based cooperative target tracking

Ahmad, A., Ruff, E., Bülthoff, H.

In pages: 1728-1734, IEEE, 2016 (inproceedings)

Abstract
In this article we present a new method for multi-robot cooperative target tracking based on dynamic baseline stereo vision. The core novelty of our approach includes a computationally light-weight scheme to compute the 3D stereo measurements that exactly satisfy the epipolar constraints and a covariance intersection (CI)-based method to fuse the 3D measurements obtained by each individual robot. Using CI we are able to systematically integrate the robot localization uncertainties as well as the uncertainties in the measurements generated by the monocular camera images from each individual robot into the resulting stereo measurements. Through an extensive set of simulation and real robot results we show the robustness and accuracy of our approach with respect to ground truth. The source code related to this article is publicly accessible on our website and the datasets are available on request.

[BibTex]

2016

[BibTex]


Thumb md teaser
Deep Discrete Flow

Güney, F., Geiger, A.

Asian Conference on Computer Vision (ACCV), 2016 (conference) Accepted

pdf suppmat [BibTex]

pdf suppmat [BibTex]


Thumb md gadde
Superpixel Convolutional Networks using Bilateral Inceptions

Gadde, R., Jampani, V., Kiefel, M., Kappler, D., Gehler, P.

In European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, Springer, October 2016 (inproceedings)

Abstract
In this paper we propose a CNN architecture for semantic image segmentation. We introduce a new “bilateral inception” module that can be inserted in existing CNN architectures and performs bilateral filtering, at multiple feature-scales, between superpixels in an image. The feature spaces for bilateral filtering and other parameters of the module are learned end-to-end using standard backpropagation techniques. The bilateral inception module addresses two issues that arise with general CNN segmentation architectures. First, this module propagates information between (super) pixels while respecting image edges, thus using the structured information of the problem for improved results. Second, the layer recovers a full resolution segmentation result from the lower resolution solution of a CNN. In the experiments, we modify several existing CNN architectures by inserting our inception modules between the last CNN (1 × 1 convolution) layers. Empirical results on three different datasets show reliable improvements not only in comparison to the baseline networks, but also in comparison to several dense-pixel prediction techniques such as CRFs, while being competitive in time.

pdf supplementary poster [BibTex]

pdf supplementary poster [BibTex]


Thumb md smplify
Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image

Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M. J.

In Computer Vision – ECCV 2016, Lecture Notes in Computer Science, Springer International Publishing, October 2016 (inproceedings)

Abstract
We describe the first method to automatically estimate the 3D pose of the human body as well as its 3D shape from a single unconstrained image. We estimate a full 3D mesh and show that 2D joints alone carry a surprising amount of information about body shape. The problem is challenging because of the complexity of the human body, articulation, occlusion, clothing, lighting, and the inherent ambiguity in inferring 3D from 2D. To solve this, we fi rst use a recently published CNN-based method, DeepCut, to predict (bottom-up) the 2D body joint locations. We then fit (top-down) a recently published statistical body shape model, called SMPL, to the 2D joints. We do so by minimizing an objective function that penalizes the error between the projected 3D model joints and detected 2D joints. Because SMPL captures correlations in human shape across the population, we are able to robustly fi t it to very little data. We further leverage the 3D model to prevent solutions that cause interpenetration. We evaluate our method, SMPLify, on the Leeds Sports, HumanEva, and Human3.6M datasets, showing superior pose accuracy with respect to the state of the art.

pdf Video Sup Mat video Code Project [BibTex]

pdf Video Sup Mat video Code Project [BibTex]


Thumb md jun teaser
Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer

Xie, J., Kiefel, M., Sun, M., Geiger, A.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)

Abstract
Semantic annotations are vital for training models for object recognition, semantic segmentation or scene understanding. Unfortunately, pixelwise annotation of images at very large scale is labor-intensive and only little labeled data is available, particularly at instance level and for street scenes. In this paper, we propose to tackle this problem by lifting the semantic instance labeling task from 2D into 3D. Given reconstructions from stereo or laser data, we annotate static 3D scene elements with rough bounding primitives and develop a probabilistic model which transfers this information into the image domain. We leverage our method to obtain 2D labels for a novel suburban video dataset which we have collected, resulting in 400k semantic and instance image annotations. A comparison of our method to state-of-the-art label transfer baselines reveals that 3D information enables more efficient annotation while at the same time resulting in improved accuracy and time-coherent labels.

pdf suppmat Project Page [BibTex]

pdf suppmat Project Page [BibTex]


Thumb md capital
Patches, Planes and Probabilities: A Non-local Prior for Volumetric 3D Reconstruction

Ulusoy, A., Black, M. J., Geiger, A.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)

Abstract
In this paper, we propose a non-local structured prior for volumetric multi-view 3D reconstruction. Towards this goal, we present a novel Markov random field model based on ray potentials in which assumptions about large 3D surface patches such as planarity or Manhattan world constraints can be efficiently encoded as probabilistic priors. We further derive an inference algorithm that reasons jointly about voxels, pixels and image segments, and estimates marginal distributions of appearance, occupancy, depth, normals and planarity. Key to tractable inference is a novel hybrid representation that spans both voxel and pixel space and that integrates non-local information from 2D image segmentations in a principled way. We compare our non-local prior to commonly employed local smoothness assumptions and a variety of state-of-the-art volumetric reconstruction baselines on challenging outdoor scenes with textureless and reflective surfaces. Our experiments indicate that regularizing over larger distances has the potential to resolve ambiguities where local regularizers fail.

YouTube pdf poster suppmat Project Page [BibTex]


Thumb md tsaiteaser
Video segmentation via object flow

Tsai, Y., Yang, M., Black, M. J.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)

Abstract
Video object segmentation is challenging due to fast moving objects, deforming shapes, and cluttered backgrounds. Optical flow can be used to propagate an object segmentation over time but, unfortunately, flow is often inaccurate, particularly around object boundaries. Such boundaries are precisely where we want our segmentation to be accurate. To obtain accurate segmentation across time, we propose an efficient algorithm that considers video segmentation and optical flow estimation simultaneously. For video segmentation, we formulate a principled, multiscale, spatio-temporal objective function that uses optical flow to propagate information between frames. For optical flow estimation, particularly at object boundaries, we compute the flow independently in the segmented regions and recompose the results. We call the process object flow and demonstrate the effectiveness of jointly optimizing optical flow and video segmentation using an iterative scheme. Experiments on the SegTrack v2 and Youtube-Objects datasets show that the proposed algorithm performs favorably against the other state-of-the-art methods.

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Thumb md futeaser
Occlusion boundary detection via deep exploration of context

Fu, H., Wang, C., Tao, D., Black, M. J.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)

Abstract
Occlusion boundaries contain rich perceptual information about the underlying scene structure. They also provide important cues in many visual perception tasks such as scene understanding, object recognition, and segmentation. In this paper, we improve occlusion boundary detection via enhanced exploration of contextual information (e.g., local structural boundary patterns, observations from surrounding regions, and temporal context), and in doing so develop a novel approach based on convolutional neural networks (CNNs) and conditional random fields (CRFs). Experimental results demonstrate that our detector significantly outperforms the state-of-the-art (e.g., improving the F-measure from 0.62 to 0.71 on the commonly used CMU benchmark). Last but not least, we empirically assess the roles of several important components of the proposed detector, so as to validate the rationale behind this approach.

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Thumb md teaser
DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation

Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P., Schiele, B.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)

Abstract
This paper considers the task of articulated human pose estimation of multiple people in real-world images. We propose an approach that jointly solves the tasks of detection and pose estimation: it infers the number of persons in a scene, identifies occluded body parts, and disambiguates body parts between people in close proximity of each other. This joint formulation is in contrast to previous strategies, that address the problem by first detecting people and subsequently estimating their body pose. We propose a partitioning and labeling formulation of a set of body-part hypotheses generated with CNN-based part detectors. Our formulation, an instance of an integer linear program, implicitly performs non-maximum suppression on the set of part candidates and groups them to form configurations of body parts respecting geometric and appearance constraints. Experiments on four different datasets demonstrate state-of-the-art results for both single person and multi person pose estimation.

code pdf supplementary [BibTex]

code pdf supplementary [BibTex]


Thumb md tes cvpr16 bilateral
Learning Sparse High Dimensional Filters: Image Filtering, Dense CRFs and Bilateral Neural Networks

Jampani, V., Kiefel, M., Gehler, P.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 4452-4461, June 2016 (inproceedings)

Abstract
Bilateral filters have wide spread use due to their edge-preserving properties. The common use case is to manually choose a parametric filter type, usually a Gaussian filter. In this paper, we will generalize the parametrization and in particular derive a gradient descent algorithm so the filter parameters can be learned from data. This derivation allows to learn high dimensional linear filters that operate in sparsely populated feature spaces. We build on the permutohedral lattice construction for efficient filtering. The ability to learn more general forms of high-dimensional filters can be used in several diverse applications. First, we demonstrate the use in applications where single filter applications are desired for runtime reasons. Further, we show how this algorithm can be used to learn the pairwise potentials in densely connected conditional random fields and apply these to different image segmentation tasks. Finally, we introduce layers of bilateral filters in CNNs and propose bilateral neural networks for the use of high-dimensional sparse data. This view provides new ways to encode model structure into network architectures. A diverse set of experiments empirically validates the usage of general forms of filters.

Project Webpage Code CVF open-access pdf supplementary Poster [BibTex]


Thumb md header
Optical Flow with Semantic Segmentation and Localized Layers

Sevilla-Lara, L., Sun, D., Jampani, V., Black, M. J.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)

Abstract
Existing optical flow methods make generic, spatially homogeneous, assumptions about the spatial structure of the flow. In reality, optical flow varies across an image depending on object class. Simply put, different objects move differently. Here we exploit recent advances in static semantic scene segmentation to segment the image into objects of different types. We define different models of image motion in these regions depending on the type of object. For example, we model the motion on roads with homographies, vegetation with spatially smooth flow, and independently moving objects like cars and planes with affine motion plus deviations. We then pose the flow estimation problem using a novel formulation of localized layers, which addresses limitations of traditional layered models for dealing with complex scene motion. Our semantic flow method achieves the lowest error of any published monocular method in the KITTI-2015 flow benchmark and produces qualitatively better flow and segmentation than recent top methods on a wide range of natural videos.

video Kitti Precomputed Data (1.6GB) pdf YouTube Sequences Code Project Page [BibTex]


Thumb md appealingavatarsbig
Appealing female avatars from 3D body scans: Perceptual effects of stylization

Fleming, R., Mohler, B., Romero, J., Black, M. J., Breidt, M.

In 11th Int. Conf. on Computer Graphics Theory and Applications (GRAPP), Febuary 2016 (inproceedings)

Abstract
Advances in 3D scanning technology allow us to create realistic virtual avatars from full body 3D scan data. However, negative reactions to some realistic computer generated humans suggest that this approach might not always provide the most appealing results. Using styles derived from existing popular character designs, we present a novel automatic stylization technique for body shape and colour information based on a statistical 3D model of human bodies. We investigate whether such stylized body shapes result in increased perceived appeal with two different experiments: One focuses on body shape alone, the other investigates the additional role of surface colour and lighting. Our results consistently show that the most appealing avatar is a partially stylized one. Importantly, avatars with high stylization or no stylization at all were rated to have the least appeal. The inclusion of colour information and improvements to render quality had no significant effect on the overall perceived appeal of the avatars, and we observe that the body shape primarily drives the change in appeal ratings. For body scans with colour information, we found that a partially stylized avatar was most effective, increasing average appeal ratings by approximately 34%.

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Thumb md testbed
Moving-horizon Nonlinear Least Squares-based Multirobot Cooperative Perception

Ahmad, A., Bülthoff, H.

In pages: 1-8, IEEE, 2015 (inproceedings)

Abstract
In this article we present an online estimator for multirobot cooperative localization and target tracking based on nonlinear least squares minimization. Our method not only makes the rigorous optimization-based approach applicable online but also allows the estimator to be stable and convergent. We do so by employing a moving horizon technique to nonlinear least squares minimization and a novel design of the arrival cost function that ensures stability and convergence of the estimator. Through an extensive set of real robot experiments, we demonstrate the robustness of our method as well as the optimality of the arrival cost function. The experiments include comparisons of our method with i) an extended Kalman filter-based online-estimator and ii) an offline-estimator based on full-trajectory nonlinear least squares.

DOI [BibTex]


Thumb md mbot
Towards Optimal Robot Navigation in Urban Homes

Ventura, R., Ahmad, A.

In RoboCup 2014: Robot World Cup XVIII, pages: 318-331, Lecture Notes in Computer Science ; 8992, Springer, Cham, Switzerland, 2015 (inproceedings)

Abstract
The work presented in this paper is motivated by the goal of dependable autonomous navigation of mobile robots. This goal is a fundamental requirement for having autonomous robots in spaces such as domestic spaces and public establishments, left unattended by technical staff. In this paper we tackle this problem by taking an optimization approach: on one hand, we use a Fast Marching Approach for path planning, resulting in optimal paths in the absence of unmapped obstacles, and on the other hand we use a Dynamic Window Approach for guidance. To the best of our knowledge, the combination of these two methods is novel. We evaluate the approach on a real mobile robot, capable of moving at high speed. The evaluation makes use of an external ground truth system. We report controlled experiments that we performed, including the presence of people moving randomly nearby the robot. In our long term experiments we report a total distance of 18 km traveled during 11 hours of movement time.

DOI [BibTex]

DOI [BibTex]


A Setup for multi-UAV hardware-in-the-loop simulations

Odelga, M., Stegagno, P., Bülthoff, H., Ahmad, A.

In pages: 204-210, IEEE, nov 2015 (inproceedings)

Abstract
In this paper, we present a hardware in the loop simulation setup for multi-UAV systems. With our setup, we are able to command the robots simulated in Gazebo, a popular open source ROS-enabled physical simulator, using the computational units that are embedded on our quadrotor UAVs. Hence, we can test in simulation not only the correct execution of algorithms, but also the computational feasibility directly on the robot hardware. In addition, since our setup is inherently multi-robot, we can also test the communication flow among the robots. We provide two use cases to show the characteristics of our setup.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Thumb md result overlayed
Onboard robust person detection and tracking for domestic service robots

Sanz, D., Ahmad, A., Lima, P.

In Robot 2015: Second Iberian Robotics Conference, pages: 547-559, Advances in Intelligent Systems and Computing ; 418, Springer, Cham, Switzerland, 2015 (inproceedings)

Abstract
Domestic assistance for the elderly and impaired people is one of the biggest upcoming challenges of our society. Consequently, in-home care through domestic service robots is identified as one of the most important application area of robotics research. Assistive tasks may range from visitor reception at the door to catering for owner's small daily necessities within a house. Since most of these tasks require the robot to interact directly with humans, a predominant robot functionality is to detect and track humans in real time: either the owner of the robot or visitors at home or both. In this article we present a robust method for such a functionality that combines depth-based segmentation and visual detection. The robustness of our method lies in its capability to not only identify partially occluded humans (e.g., with only torso visible) but also to do so in varying lighting conditions. We thoroughly validate our method through extensive experiments on real robot datasets and comparisons with the ground truth. The datasets were collected on a home-like environment set up within the context of RoboCup@Home and RoCKIn@Home competitions.

DOI [BibTex]

DOI [BibTex]


Thumb md philip
FollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory and Computation

Lenz, P., Geiger, A., Urtasun, R.

In International Conference on Computer Vision (ICCV), December 2015 (inproceedings)

Abstract
One of the most popular approaches to multi-target tracking is tracking-by-detection. Current min-cost flow algorithms which solve the data association problem optimally have three main drawbacks: they are computationally expensive, they assume that the whole video is given as a batch, and they scale badly in memory and computation with the length of the video sequence. In this paper, we address each of these issues, resulting in a computationally and memory-bounded solution. First, we introduce a dynamic version of the successive shortest-path algorithm which solves the data association problem optimally while reusing computation, resulting in faster inference than standard solvers. Second, we address the optimal solution to the data association problem when dealing with an incoming stream of data (i.e., online setting). Finally, we present our main contribution which is an approximate online solution with bounded memory and computation which is capable of handling videos of arbitrary length while performing tracking in real time. We demonstrate the effectiveness of our algorithms on the KITTI and PETS2009 benchmarks and show state-of-the-art performance, while being significantly faster than existing solvers.

pdf suppmat video project [BibTex]

pdf suppmat video project [BibTex]


Thumb md thumb3
3D Object Reconstruction from Hand-Object Interactions

Tzionas, D., Gall, J.

In International Conference on Computer Vision (ICCV), December 2015 (inproceedings)

Abstract
Recent advances have enabled 3d object reconstruction approaches using a single off-the-shelf RGB-D camera. Although these approaches are successful for a wide range of object classes, they rely on stable and distinctive geometric or texture features. Many objects like mechanical parts, toys, household or decorative articles, however, are textureless and characterized by minimalistic shapes that are simple and symmetric. Existing in-hand scanning systems and 3d reconstruction techniques fail for such symmetric objects in the absence of highly distinctive features. In this work, we show that extracting 3d hand motion for in-hand scanning effectively facilitates the reconstruction of even featureless and highly symmetric objects and we present an approach that fuses the rich additional information of hands into a 3d reconstruction pipeline, significantly contributing to the state-of-the-art of in-hand scanning.

pdf Project's Website Video Spotlight Extended Abstract YouTube DOI Project Page [BibTex]


Thumb md bogo iccv2015 teaser
Detailed Full-Body Reconstructions of Moving People from Monocular RGB-D Sequences

Bogo, F., Black, M. J., Loper, M., Romero, J.

In International Conference on Computer Vision (ICCV), pages: 2300-2308, December 2015 (inproceedings)

Abstract
We accurately estimate the 3D geometry and appearance of the human body from a monocular RGB-D sequence of a user moving freely in front of the sensor. Range data in each frame is first brought into alignment with a multi-resolution 3D body model in a coarse-to-fine process. The method then uses geometry and image texture over time to obtain accurate shape, pose, and appearance information despite unconstrained motion, partial views, varying resolution, occlusion, and soft tissue deformation. Our novel body model has variable shape detail, allowing it to capture faces with a high-resolution deformable head model and body shape with lower-resolution. Finally we combine range data from an entire sequence to estimate a high-resolution displacement map that captures fine shape details. We compare our recovered models with high-resolution scans from a professional system and with avatars created by a commercial product. We extract accurate 3D avatars from challenging motion sequences and even capture soft tissue dynamics.

Video pdf Project Page Project Page [BibTex]

Video pdf Project Page Project Page [BibTex]


Thumb md intrinsicdepth teaser1
Intrinsic Depth: Improving Depth Transfer with Intrinsic Images

Kong, N., Black, M. J.

In IEEE International Conference on Computer Vision (ICCV), pages: 3514-3522, December 2015 (inproceedings)

Abstract
We formulate the estimation of dense depth maps from video sequences as a problem of intrinsic image estimation. Our approach synergistically integrates the estimation of multiple intrinsic images including depth, albedo, shading, optical flow, and surface contours. We build upon an example-based framework for depth estimation that uses label transfer from a database of RGB and depth pairs. We combine this with a method that extracts consistent albedo and shading from video. In contrast to raw RGB values, albedo and shading provide a richer, more physical, foundation for depth transfer. Additionally we train a new contour detector to predict surface boundaries from albedo, shading, and pixel values and use this to improve the estimation of depth boundaries. We also integrate sparse structure from motion with our method to improve the metric accuracy of the estimated depth maps. We evaluate our Intrinsic Depth method quantitatively by estimating depth from videos in the NYU RGB-D and SUN3D datasets. We find that combining the estimation of multiple intrinsic images improves depth estimation relative to the baseline method.

pdf suppmat YouTube official video poster Project Page [BibTex]

pdf suppmat YouTube official video poster Project Page [BibTex]


Thumb md teaser
Permutohedral Lattice CNNs

Kiefel, M., Jampani, V., Gehler, P.

In ICLR Workshop Track, May 2015 (inproceedings)

Abstract
This paper presents a convolutional layer that is able to process sparse input features. As an example, for image recognition problems this allows an efficient filtering of signals that do not lie on a dense grid (like pixel position), but of more general features (such as color values). The presented algorithm makes use of the permutohedral lattice data structure. The permutohedral lattice was introduced to efficiently implement a bilateral filter, a commonly used image processing operation. Its use allows for a generalization of the convolution type found in current (spatial) convolutional network architectures.

pdf link (url) Project Page [BibTex]

pdf link (url) Project Page [BibTex]


Thumb md zhou
Exploiting Object Similarity in 3D Reconstruction

Zhou, C., Güney, F., Wang, Y., Geiger, A.

In International Conference on Computer Vision (ICCV), December 2015 (inproceedings)

Abstract
Despite recent progress, reconstructing outdoor scenes in 3D from movable platforms remains a highly difficult endeavor. Challenges include low frame rates, occlusions, large distortions and difficult lighting conditions. In this paper, we leverage the fact that the larger the reconstructed area, the more likely objects of similar type and shape will occur in the scene. This is particularly true for outdoor scenes where buildings and vehicles often suffer from missing texture or reflections, but share similarity in 3D shape. We take advantage of this shape similarity by locating objects using detectors and jointly reconstructing them while learning a volumetric model of their shape. This allows us to reduce noise while completing missing surfaces as objects of similar shape benefit from all observations for the respective category. We evaluate our approach with respect to LIDAR ground truth on a novel challenging suburban dataset and show its advantages over the state-of-the-art.

pdf suppmat [BibTex]

pdf suppmat [BibTex]


Thumb md sap2015
Perception of Strength and Power of Realistic Male Characters

Wellerdiek, A., Breidt, M., Geuss, M., Streuber, S., Kloos, U., Black, M. J., Mohler, B.

In Proc. ACM SIGGRAPH Symposium on Applied Perception, SAP’15, pages: 7-14, ACM, New York, NY, September 2015 (inproceedings)

Abstract
We investigated the influence of body shape and pose on the perception of physical strength and social power for male virtual characters. In the first experiment, participants judged the physical strength of varying body shapes, derived from a statistical 3D body model. Based on these ratings, we determined three body shapes (weak, average, and strong) and animated them with a set of power poses for the second experiment. Participants rated how strong or powerful they perceived virtual characters of varying body shapes that were displayed in different poses. Our results show that perception of physical strength was mainly driven by the shape of the body. However, the social attribute of power was influenced by an interaction between pose and shape. Specifically, the effect of pose on power ratings was greater for weak body shapes. These results demonstrate that a character with a weak shape can be perceived as more powerful when in a high-power pose.

PDF DOI Project Page [BibTex]

PDF DOI Project Page [BibTex]


Thumb md teaser
Towards Probabilistic Volumetric Reconstruction using Ray Potentials

(Best Paper Award)

Ulusoy, A., Geiger, A., Black, M. J.

In 3D Vision (3DV), 2015 3rd International Conference on, pages: 10-18, Lyon, October 2015 (inproceedings)

Abstract
This paper presents a novel probabilistic foundation for volumetric 3-d reconstruction. We formulate the problem as inference in a Markov random field, which accurately captures the dependencies between the occupancy and appearance of each voxel, given all input images. Our main contribution is an approximate highly parallelized discrete-continuous inference algorithm to compute the marginal distributions of each voxel's occupancy and appearance. In contrast to the MAP solution, marginals encode the underlying uncertainty and ambiguity in the reconstruction. Moreover, the proposed algorithm allows for a Bayes optimal prediction with respect to a natural reconstruction loss. We compare our method to two state-of-the-art volumetric reconstruction algorithms on three challenging aerial datasets with LIDAR ground truth. Our experiments demonstrate that the proposed algorithm compares favorably in terms of reconstruction accuracy and the ability to expose reconstruction uncertainty.

code YouTube pdf suppmat DOI Project Page [BibTex]

code YouTube pdf suppmat DOI Project Page [BibTex]


Thumb md flowcap im
FlowCap: 2D Human Pose from Optical Flow

Romero, J., Loper, M., Black, M. J.

In Pattern Recognition, Proc. 37th German Conference on Pattern Recognition (GCPR), LNCS 9358, pages: 412-423, Springer, 2015 (inproceedings)

Abstract
We estimate 2D human pose from video using only optical flow. The key insight is that dense optical flow can provide information about 2D body pose. Like range data, flow is largely invariant to appearance but unlike depth it can be directly computed from monocular video. We demonstrate that body parts can be detected from dense flow using the same random forest approach used by the Microsoft Kinect. Unlike range data, however, when people stop moving, there is no optical flow and they effectively disappear. To address this, our FlowCap method uses a Kalman filter to propagate body part positions and ve- locities over time and a regression method to predict 2D body pose from part centers. No range sensor is required and FlowCap estimates 2D human pose from monocular video sources containing human motion. Such sources include hand-held phone cameras and archival television video. We demonstrate 2D body pose estimation in a range of scenarios and show that the method works with real-time optical flow. The results suggest that optical flow shares invariances with range data that, when complemented with tracking, make it valuable for pose estimation.

video pdf preprint Project Page [BibTex]

video pdf preprint Project Page [BibTex]


Thumb md screenshot area 2015 07 27 013425
The fertilized forests Decision Forest Library

Lassner, C., Lienhart, R.

In ACM Transactions on Multimedia (ACMMM) Open-source Software Competition, October 2015 (inproceedings)

Abstract
Since the introduction of Random Forests in the 80's they have been a frequently used statistical tool for a variety of machine learning tasks. Many different training algorithms and model adaptions demonstrate the versatility of the forests. This variety resulted in a fragmentation of research and code, since each adaption requires its own algorithms and representations. In 2011, Criminisi and Shotton developed a unifying Decision Forest model for many tasks. By identifying the reusable parts and specifying clear interfaces, we extend this approach to an object oriented representation and implementation. This has the great advantage that research on specific parts of the Decision Forest model can be done `locally' by reusing well-tested and high-performance components. Our fertilized forests library is open source and easy to extend. It provides components allowing for parallelization up to node optimization level to exploit modern many core architectures. Additionally, the library provides consistent and easy-to-maintain interfaces to C++, Python and Matlab and offers cross-platform and cross-interface persistence.

website and code pdf [BibTex]

website and code pdf [BibTex]


Thumb md screenshot area 2015 07 27 014123
Active Learning for Efficient Sampling of Control Models of Collectives

Schiendorfer, A., Lassner, C., Anders, G., Reif, W., Lienhart, R.

In International Conference on Self-adaptive and Self-organizing Systems (SASO), September 2015 (inproceedings)

Abstract
Many large-scale systems benefit from an organizational structure to provide for problem decomposition. A pivotal problem solving setting is given by hierarchical control systems familiar from hierarchical task networks. If these structures can be modified autonomously by, e.g., coalition formation and reconfiguration, adequate decisions on higher levels require a faithful abstracted model of a collective of agents. An illustrative example is found in calculating schedules for a set of power plants organized in a hierarchy of Autonomous Virtual Power Plants. Functional dependencies over the combinatorial domain, such as the joint costs or rates of change of power production, are approximated by repeatedly sampling input-output pairs and substituting the actual functions by piecewise linear functions. However, if the sampled data points are weakly informative, the resulting abstracted high-level optimization introduces severe errors. Furthermore, obtaining additional point labels amounts to solving computationally hard optimization problems. Building on prior work, we propose to apply techniques from active learning to maximize the information gained by each additional point. Our results show that significantly better allocations in terms of cost-efficiency (up to 33.7 % reduction in costs in our case study) can be found with fewer but carefully selected sampling points using Decision Forests.

code (hosted on github) [BibTex]

code (hosted on github) [BibTex]


Thumb md screenshot area 2015 07 27 010243
Active Learning for Abstract Models of Collectives

Schiendorfer, A., Lassner, C., Anders, G., Reif, W., Lienhart, R.

In 3rd Workshop on Self-optimisation in Organic and Autonomic Computing Systems (SAOS), March 2015 (inproceedings)

Abstract
Organizational structures such as hierarchies provide an effective means to deal with the increasing complexity found in large-scale energy systems. In hierarchical systems, the concrete functions describing the subsystems can be replaced by abstract piecewise linear functions to speed up the optimization process. However, if the data points are weakly informative the resulting abstracted optimization problem introduces severe errors and exhibits bad runtime performance. Furthermore, obtaining additional point labels amounts to solving computationally hard optimization problems. Therefore, we propose to apply methods from active learning to search for informative inputs. We present first results experimenting with Decision Forests and Gaussian Processes that motivate further research. Using points selected by Decision Forests, we could reduce the average mean-squared error of the abstract piecewise linear function by one third.

code (hosted on github) pdf [BibTex]

code (hosted on github) pdf [BibTex]


Thumb md screenshot area 2015 07 27 004943
Norm-induced entropies for decision forests

Lassner, C., Lienhart, R.

IEEE Winter Conference on Applications of Computer Vision (WACV), January 2015 (conference)

Abstract
The entropy measurement function is a central element of decision forest induction. The Shannon entropy and other generalized entropies such as the Renyi and Tsallis entropy are designed to fulfill the Khinchin-Shannon axioms. Whereas these axioms are appropriate for physical systems, they do not necessarily model well the artificial system of decision forest induction. In this paper, we show that when omitting two of the four axioms, every norm induces an entropy function. The remaining two axioms are sufficient to describe the requirements for an entropy function in the decision forest context. Furthermore, we introduce and analyze the p-norm-induced entropy, show relations to existing entropies and the relation to various heuristics that are commonly used for decision forest training. In experiments with classification, regression and the recently introduced Hough forests, we show how the discrete and differential form of the new entropy can be used for forest induction and how the functions can simply be fine-tuned. The experiments indicate that the impact of the entropy function is limited, however can be a simple and useful post-processing step for optimizing decision forests for high performance applications.

pdf code [BibTex]

pdf code [BibTex]


Thumb md menze
Discrete Optimization for Optical Flow

Menze, M., Heipke, C., Geiger, A.

In German Conference on Pattern Recognition (GCPR), 9358, pages: 16-28, Springer International Publishing, 2015 (inproceedings)

Abstract
We propose to look at large-displacement optical flow from a discrete point of view. Motivated by the observation that sub-pixel accuracy is easily obtained given pixel-accurate optical flow, we conjecture that computing the integral part is the hardest piece of the problem. Consequently, we formulate optical flow estimation as a discrete inference problem in a conditional random field, followed by sub-pixel refinement. Naive discretization of the 2D flow space, however, is intractable due to the resulting size of the label set. In this paper, we therefore investigate three different strategies, each able to reduce computation and memory demands by several orders of magnitude. Their combination allows us to estimate large-displacement optical flow both accurately and efficiently and demonstrates the potential of discrete optimization for optical flow. We obtain state-of-the-art performance on MPI Sintel and KITTI.

pdf suppmat project DOI Project Page [BibTex]

pdf suppmat project DOI Project Page [BibTex]


Thumb md geiger
Joint 3D Object and Layout Inference from a single RGB-D Image

(Best Paper Award)

Geiger, A., Wang, C.

In German Conference on Pattern Recognition (GCPR), 9358, pages: 183-195, Lecture Notes in Computer Science, Springer International Publishing, 2015 (inproceedings)

Abstract
Inferring 3D objects and the layout of indoor scenes from a single RGB-D image captured with a Kinect camera is a challenging task. Towards this goal, we propose a high-order graphical model and jointly reason about the layout, objects and superpixels in the image. In contrast to existing holistic approaches, our model leverages detailed 3D geometry using inverse graphics and explicitly enforces occlusion and visibility constraints for respecting scene properties and projective geometry. We cast the task as MAP inference in a factor graph and solve it efficiently using message passing. We evaluate our method with respect to several baselines on the challenging NYUv2 indoor dataset using 21 object categories. Our experiments demonstrate that the proposed method is able to infer scenes with a large degree of clutter and occlusions.

pdf suppmat video project DOI Project Page Project Page [BibTex]

pdf suppmat video project DOI Project Page Project Page [BibTex]


Thumb md subimage
Smooth Loops from Unconstrained Video

Sevilla-Lara, L., Wulff, J., Sunkavalli, K., Shechtman, E.

In Computer Graphics Forum (Proceedings of EGSR), 2015 (inproceedings)

Abstract
Converting unconstrained video sequences into videos that loop seamlessly is an extremely challenging problem. In this work, we take the first steps towards automating this process by focusing on an important subclass of videos containing a single dominant foreground object. Our technique makes two novel contributions over previous work: first, we propose a correspondence-based similarity metric to automatically identify a good transition point in the video where the appearance and dynamics of the foreground are most consistent. Second, we develop a technique that aligns both the foreground and background about this transition point using a combination of global camera path planning and patch-based video morphing. We demonstrate that this allows us to create natural, compelling, loopy videos from a wide range of videos collected from the internet.

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Thumb md bmvc2015 web teaser
Human Pose as Context for Object Detection

Srikantha, A., Gall, J.

British Machine Vision Conference, September 2015 (conference)

Abstract
Detecting small objects in images is a challenging problem particularly when they are often occluded by hands or other body parts. Recently, joint modelling of human pose and objects has been proposed to improve both pose estimation as well as object detection. These approaches, however, focus on explicit interaction with an object and lack the flexibility to combine both modalities when interaction is not obvious. We therefore propose to use human pose as an additional context information for object detection. To this end, we represent an object category by a tree model and train regression forests that localize parts of an object for each modality separately. Predictions of the two modalities are then combined to detect the bounding box of the object. We evaluate our approach on three challenging datasets which vary in the amount of object interactions and the quality of automatically extracted human poses.

pdf abstract Project Page [BibTex]

pdf abstract Project Page [BibTex]


Thumb md isa
Joint 3D Estimation of Vehicles and Scene Flow

Menze, M., Heipke, C., Geiger, A.

In Proc. of the ISPRS Workshop on Image Sequence Analysis (ISA), 2015 (inproceedings)

Abstract
Three-dimensional reconstruction of dynamic scenes is an important prerequisite for applications like mobile robotics or autonomous driving. While much progress has been made in recent years, imaging conditions in natural outdoor environments are still very challenging for current reconstruction and recognition methods. In this paper, we propose a novel unified approach which reasons jointly about 3D scene flow as well as the pose, shape and motion of vehicles in the scene. Towards this goal, we incorporate a deformable CAD model into a slanted-plane conditional random field for scene flow estimation and enforce shape consistency between the rendered 3D models and the parameters of all superpixels in the image. The association of superpixels to objects is established by an index variable which implicitly enables model selection. We evaluate our approach on the challenging KITTI scene flow dataset in terms of object and scene flow estimation. Our results provide a prove of concept and demonstrate the usefulness of our method.

PDF Project Page [BibTex]

PDF Project Page [BibTex]


Thumb md screen shot 2015 05 07 at 11.56.54
3D Object Class Detection in the Wild

Pepik, B., Stark, M., Gehler, P., Ritschel, T., Schiele, B.

In Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, 2015 (inproceedings)

Project Page [BibTex]

Project Page [BibTex]


Thumb md silviateaser
The Stitched Puppet: A Graphical Model of 3D Human Shape and Pose

Zuffi, S., Black, M. J.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2015), pages: 3537-3546, June 2015 (inproceedings)

Abstract
We propose a new 3D model of the human body that is both realistic and part-based. The body is represented by a graphical model in which nodes of the graph correspond to body parts that can independently translate and rotate in 3D as well as deform to capture pose-dependent shape variations. Pairwise potentials define a “stitching cost” for pulling the limbs apart, giving rise to the stitched puppet model (SPM). Unlike existing realistic 3D body models, the distributed representation facilitates inference by allowing the model to more effectively explore the space of poses, much like existing 2D pictorial structures models. We infer pose and body shape using a form of particle-based max-product belief propagation. This gives the SPM the realism of recent 3D body models with the computational advantages of part-based models. We apply the SPM to two challenging problems involving estimating human shape and pose from 3D data. The first is the FAUST mesh alignment challenge (http://faust.is.tue.mpg.de/), where ours is the first method to successfully align all 3D meshes. The second involves estimating pose and shape from crude visual hull representations of complex body movements.

pdf Extended Abstract poster code/project video DOI Project Page [BibTex]

pdf Extended Abstract poster code/project video DOI Project Page [BibTex]


Thumb md jonasteaser
Efficient Sparse-to-Dense Optical Flow Estimation using a Learned Basis and Layers

Wulff, J., Black, M. J.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2015), pages: 120-130, June 2015 (inproceedings)

Abstract
We address the elusive goal of estimating optical flow both accurately and efficiently by adopting a sparse-to-dense approach. Given a set of sparse matches, we regress to dense optical flow using a learned set of full-frame basis flow fields. We learn the principal components of natural flow fields using flow computed from four Hollywood movies. Optical flow fields are then compactly approximated as a weighted sum of the basis flow fields. Our new PCA-Flow algorithm robustly estimates these weights from sparse feature matches. The method runs in under 300ms/frame on the MPI-Sintel dataset using a single CPU and is more accurate and significantly faster than popular methods such as LDOF and Classic+NL. The results, however, are too smooth for some applications. Consequently, we develop a novel sparse layered flow method in which each layer is represented by PCA-flow. Unlike existing layered methods, estimation is fast because it uses only sparse matches. We combine information from different layers into a dense flow field using an image-aware MRF. The resulting PCA-Layers method runs in 3.6s/frame, is significantly more accurate than PCA-flow and achieves state-of-the-art performance in occluded regions on MPI-Sintel.

pdf Extended Abstract poster code Project Page [BibTex]

pdf Extended Abstract poster code Project Page [BibTex]


Thumb md ijazteaser
Pose-Conditioned Joint Angle Limits for 3D Human Pose Reconstruction

Akhter, I., Black, M. J.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2015), pages: 1446-1455, June 2015 (inproceedings)

Abstract
The estimation of 3D human pose from 2D joint locations is central to many vision problems involving the analysis of people in images and video. To address the fact that the problem is inherently ill posed, many methods impose a prior over human poses. Unfortunately these priors admit invalid poses because they do not model how joint-limits vary with pose. Here we make two key contributions. First, we collected a motion capture dataset that explores a wide range of human poses. From this we learn a pose-dependent model of joint limits that forms our prior. The dataset and the prior will be made publicly available. Second, we define a general parameterization of body pose and a new, multistage, method to estimate 3D pose from 2D joint locations that uses an over-complete dictionary of human poses. Our method shows good generalization while avoiding impossible poses. We quantitatively compare our method with recent work and show state-of-the-art results on 2D to 3D pose estimation using the CMU mocap dataset. We also show superior results on manual annotations on real images and automatic part-based detections on the Leeds sports pose dataset.

pdf Extended Abstract video project/data/code poster DOI Project Page [BibTex]

pdf Extended Abstract video project/data/code poster DOI Project Page [BibTex]