Header logo is ps


2015


Thumb xl img sceneflow
Object Scene Flow for Autonomous Vehicles

Menze, M., Geiger, A.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) 2015, pages: 3061-3070, IEEE, June 2015 (inproceedings)

Abstract
This paper proposes a novel model and dataset for 3D scene flow estimation with an application to autonomous driving. Taking advantage of the fact that outdoor scenes often decompose into a small number of independently moving objects, we represent each element in the scene by its rigid motion parameters and each superpixel by a 3D plane as well as an index to the corresponding object. This minimal representation increases robustness and leads to a discrete-continuous CRF where the data term decomposes into pairwise potentials between superpixels and objects. Moreover, our model intrinsically segments the scene into its constituting dynamic components. We demonstrate the performance of our model on existing benchmarks as well as a novel realistic dataset with scene flow ground truth. We obtain this dataset by annotating 400 dynamic scenes from the KITTI raw data collection using detailed 3D CAD models for all vehicles in motion. Our experiments also reveal novel challenges which can't be handled by existing methods.

pdf abstract suppmat DOI [BibTex]

2015

pdf abstract suppmat DOI [BibTex]


Thumb xl ijazteaser
Pose-Conditioned Joint Angle Limits for 3D Human Pose Reconstruction

Akhter, I., Black, M. J.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2015), pages: 1446-1455, June 2015 (inproceedings)

Abstract
The estimation of 3D human pose from 2D joint locations is central to many vision problems involving the analysis of people in images and video. To address the fact that the problem is inherently ill posed, many methods impose a prior over human poses. Unfortunately these priors admit invalid poses because they do not model how joint-limits vary with pose. Here we make two key contributions. First, we collected a motion capture dataset that explores a wide range of human poses. From this we learn a pose-dependent model of joint limits that forms our prior. The dataset and the prior will be made publicly available. Second, we define a general parameterization of body pose and a new, multistage, method to estimate 3D pose from 2D joint locations that uses an over-complete dictionary of human poses. Our method shows good generalization while avoiding impossible poses. We quantitatively compare our method with recent work and show state-of-the-art results on 2D to 3D pose estimation using the CMU mocap dataset. We also show superior results on manual annotations on real images and automatic part-based detections on the Leeds sports pose dataset.

pdf Extended Abstract video project/data/code poster DOI Project Page Project Page [BibTex]

pdf Extended Abstract video project/data/code poster DOI Project Page Project Page [BibTex]


Thumb xl jonasteaser
Efficient Sparse-to-Dense Optical Flow Estimation using a Learned Basis and Layers

Wulff, J., Black, M. J.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2015), pages: 120-130, June 2015 (inproceedings)

Abstract
We address the elusive goal of estimating optical flow both accurately and efficiently by adopting a sparse-to-dense approach. Given a set of sparse matches, we regress to dense optical flow using a learned set of full-frame basis flow fields. We learn the principal components of natural flow fields using flow computed from four Hollywood movies. Optical flow fields are then compactly approximated as a weighted sum of the basis flow fields. Our new PCA-Flow algorithm robustly estimates these weights from sparse feature matches. The method runs in under 300ms/frame on the MPI-Sintel dataset using a single CPU and is more accurate and significantly faster than popular methods such as LDOF and Classic+NL. The results, however, are too smooth for some applications. Consequently, we develop a novel sparse layered flow method in which each layer is represented by PCA-flow. Unlike existing layered methods, estimation is fast because it uses only sparse matches. We combine information from different layers into a dense flow field using an image-aware MRF. The resulting PCA-Layers method runs in 3.6s/frame, is significantly more accurate than PCA-flow and achieves state-of-the-art performance in occluded regions on MPI-Sintel.

pdf Extended Abstract Supplemental Material Poster Code Project Page Project Page [BibTex]


Thumb xl teaser
Permutohedral Lattice CNNs

Kiefel, M., Jampani, V., Gehler, P. V.

In ICLR Workshop Track, May 2015 (inproceedings)

Abstract
This paper presents a convolutional layer that is able to process sparse input features. As an example, for image recognition problems this allows an efficient filtering of signals that do not lie on a dense grid (like pixel position), but of more general features (such as color values). The presented algorithm makes use of the permutohedral lattice data structure. The permutohedral lattice was introduced to efficiently implement a bilateral filter, a commonly used image processing operation. Its use allows for a generalization of the convolution type found in current (spatial) convolutional network architectures.

pdf link (url) [BibTex]

pdf link (url) [BibTex]


Thumb xl jampani15aistats teaser
Consensus Message Passing for Layered Graphical Models

Jampani, V., Eslami, S. M. A., Tarlow, D., Kohli, P., Winn, J.

In Eighteenth International Conference on Artificial Intelligence and Statistics (AISTATS), 38, pages: 425-433, JMLR Workshop and Conference Proceedings, May 2015 (inproceedings)

Abstract
Generative models provide a powerful framework for probabilistic reasoning. However, in many domains their use has been hampered by the practical difficulties of inference. This is particularly the case in computer vision, where models of the imaging process tend to be large, loopy and layered. For this reason bottom-up conditional models have traditionally dominated in such domains. We find that widely-used, general-purpose message passing inference algorithms such as Expectation Propagation (EP) and Variational Message Passing (VMP) fail on the simplest of vision models. With these models in mind, we introduce a modification to message passing that learns to exploit their layered structure by passing 'consensus' messages that guide inference towards good solutions. Experiments on a variety of problems show that the proposed technique leads to significantly more accurate inference results, not only when compared to standard EP and VMP, but also when compared to competitive bottom-up conditional models.

online pdf supplementary link (url) [BibTex]

online pdf supplementary link (url) [BibTex]


Thumb xl screenshot area 2015 07 27 010243
Active Learning for Abstract Models of Collectives

Schiendorfer, A., Lassner, C., Anders, G., Reif, W., Lienhart, R.

In 3rd Workshop on Self-optimisation in Organic and Autonomic Computing Systems (SAOS), March 2015 (inproceedings)

Abstract
Organizational structures such as hierarchies provide an effective means to deal with the increasing complexity found in large-scale energy systems. In hierarchical systems, the concrete functions describing the subsystems can be replaced by abstract piecewise linear functions to speed up the optimization process. However, if the data points are weakly informative the resulting abstracted optimization problem introduces severe errors and exhibits bad runtime performance. Furthermore, obtaining additional point labels amounts to solving computationally hard optimization problems. Therefore, we propose to apply methods from active learning to search for informative inputs. We present first results experimenting with Decision Forests and Gaussian Processes that motivate further research. Using points selected by Decision Forests, we could reduce the average mean-squared error of the abstract piecewise linear function by one third.

code (hosted on github) pdf [BibTex]

code (hosted on github) pdf [BibTex]


Thumb xl untitled
Efficient Facade Segmentation using Auto-Context

Jampani, V., Gadde, R., Gehler, P. V.

In Applications of Computer Vision (WACV), 2015 IEEE Winter Conference on, pages: 1038-1045, IEEE, January 2015 (inproceedings)

Abstract
In this paper we propose a system for the problem of facade segmentation. Building facades are highly structured images and consequently most methods that have been proposed for this problem, aim to make use of this strong prior information. We are describing a system that is almost domain independent and consists of standard segmentation methods. A sequence of boosted decision trees is stacked using auto-context features and learned using the stacked generalization technique. We find that this, albeit standard, technique performs better, or equals, all previous published empirical results on all available facade benchmark datasets. The proposed method is simple to implement, easy to extend, and very efficient at test time inference.

website pdf supplementary IEEE page link (url) DOI Project Page [BibTex]

website pdf supplementary IEEE page link (url) DOI Project Page [BibTex]


Thumb xl screenshot area 2015 07 27 004943
Norm-induced entropies for decision forests

Lassner, C., Lienhart, R.

IEEE Winter Conference on Applications of Computer Vision (WACV), January 2015 (conference)

Abstract
The entropy measurement function is a central element of decision forest induction. The Shannon entropy and other generalized entropies such as the Renyi and Tsallis entropy are designed to fulfill the Khinchin-Shannon axioms. Whereas these axioms are appropriate for physical systems, they do not necessarily model well the artificial system of decision forest induction. In this paper, we show that when omitting two of the four axioms, every norm induces an entropy function. The remaining two axioms are sufficient to describe the requirements for an entropy function in the decision forest context. Furthermore, we introduce and analyze the p-norm-induced entropy, show relations to existing entropies and the relation to various heuristics that are commonly used for decision forest training. In experiments with classification, regression and the recently introduced Hough forests, we show how the discrete and differential form of the new entropy can be used for forest induction and how the functions can simply be fine-tuned. The experiments indicate that the impact of the entropy function is limited, however can be a simple and useful post-processing step for optimizing decision forests for high performance applications.

pdf code [BibTex]

pdf code [BibTex]


Thumb xl lrmmbotperson withmbot
Dataset Suite for Benchmarking Perception in Robotics

Ahmad, A., Lima, P.

In International Conference on Intelligent Robots and Systems (IROS) 2015, 2015 (inproceedings)

[BibTex]

[BibTex]


Thumb xl flowcap im
FlowCap: 2D Human Pose from Optical Flow

Romero, J., Loper, M., Black, M. J.

In Pattern Recognition, Proc. 37th German Conference on Pattern Recognition (GCPR), LNCS 9358, pages: 412-423, Springer, 2015 (inproceedings)

Abstract
We estimate 2D human pose from video using only optical flow. The key insight is that dense optical flow can provide information about 2D body pose. Like range data, flow is largely invariant to appearance but unlike depth it can be directly computed from monocular video. We demonstrate that body parts can be detected from dense flow using the same random forest approach used by the Microsoft Kinect. Unlike range data, however, when people stop moving, there is no optical flow and they effectively disappear. To address this, our FlowCap method uses a Kalman filter to propagate body part positions and ve- locities over time and a regression method to predict 2D body pose from part centers. No range sensor is required and FlowCap estimates 2D human pose from monocular video sources containing human motion. Such sources include hand-held phone cameras and archival television video. We demonstrate 2D body pose estimation in a range of scenarios and show that the method works with real-time optical flow. The results suggest that optical flow shares invariances with range data that, when complemented with tracking, make it valuable for pose estimation.

video pdf preprint Project Page Project Page [BibTex]

video pdf preprint Project Page Project Page [BibTex]


Thumb xl mbot
Towards Optimal Robot Navigation in Urban Homes

Ventura, R., Ahmad, A.

In RoboCup 2014: Robot World Cup XVIII, pages: 318-331, Lecture Notes in Computer Science ; 8992, Springer, Cham, Switzerland, 2015 (inproceedings)

Abstract
The work presented in this paper is motivated by the goal of dependable autonomous navigation of mobile robots. This goal is a fundamental requirement for having autonomous robots in spaces such as domestic spaces and public establishments, left unattended by technical staff. In this paper we tackle this problem by taking an optimization approach: on one hand, we use a Fast Marching Approach for path planning, resulting in optimal paths in the absence of unmapped obstacles, and on the other hand we use a Dynamic Window Approach for guidance. To the best of our knowledge, the combination of these two methods is novel. We evaluate the approach on a real mobile robot, capable of moving at high speed. The evaluation makes use of an external ground truth system. We report controlled experiments that we performed, including the presence of people moving randomly nearby the robot. In our long term experiments we report a total distance of 18 km traveled during 11 hours of movement time.

DOI [BibTex]

DOI [BibTex]


Thumb xl geiger
Joint 3D Object and Layout Inference from a single RGB-D Image

(Best Paper Award)

Geiger, A., Wang, C.

In German Conference on Pattern Recognition (GCPR), 9358, pages: 183-195, Lecture Notes in Computer Science, Springer International Publishing, 2015 (inproceedings)

Abstract
Inferring 3D objects and the layout of indoor scenes from a single RGB-D image captured with a Kinect camera is a challenging task. Towards this goal, we propose a high-order graphical model and jointly reason about the layout, objects and superpixels in the image. In contrast to existing holistic approaches, our model leverages detailed 3D geometry using inverse graphics and explicitly enforces occlusion and visibility constraints for respecting scene properties and projective geometry. We cast the task as MAP inference in a factor graph and solve it efficiently using message passing. We evaluate our method with respect to several baselines on the challenging NYUv2 indoor dataset using 21 object categories. Our experiments demonstrate that the proposed method is able to infer scenes with a large degree of clutter and occlusions.

pdf suppmat video project DOI [BibTex]

pdf suppmat video project DOI [BibTex]


Thumb xl screen shot 2015 05 07 at 11.56.54
3D Object Class Detection in the Wild

Pepik, B., Stark, M., Gehler, P., Ritschel, T., Schiele, B.

In Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, 2015 (inproceedings)

Project Page [BibTex]

Project Page [BibTex]


Thumb xl menze
Discrete Optimization for Optical Flow

Menze, M., Heipke, C., Geiger, A.

In German Conference on Pattern Recognition (GCPR), 9358, pages: 16-28, Springer International Publishing, 2015 (inproceedings)

Abstract
We propose to look at large-displacement optical flow from a discrete point of view. Motivated by the observation that sub-pixel accuracy is easily obtained given pixel-accurate optical flow, we conjecture that computing the integral part is the hardest piece of the problem. Consequently, we formulate optical flow estimation as a discrete inference problem in a conditional random field, followed by sub-pixel refinement. Naive discretization of the 2D flow space, however, is intractable due to the resulting size of the label set. In this paper, we therefore investigate three different strategies, each able to reduce computation and memory demands by several orders of magnitude. Their combination allows us to estimate large-displacement optical flow both accurately and efficiently and demonstrates the potential of discrete optimization for optical flow. We obtain state-of-the-art performance on MPI Sintel and KITTI.

pdf suppmat project DOI [BibTex]

pdf suppmat project DOI [BibTex]


Thumb xl isa
Joint 3D Estimation of Vehicles and Scene Flow

Menze, M., Heipke, C., Geiger, A.

In Proc. of the ISPRS Workshop on Image Sequence Analysis (ISA), 2015 (inproceedings)

Abstract
Three-dimensional reconstruction of dynamic scenes is an important prerequisite for applications like mobile robotics or autonomous driving. While much progress has been made in recent years, imaging conditions in natural outdoor environments are still very challenging for current reconstruction and recognition methods. In this paper, we propose a novel unified approach which reasons jointly about 3D scene flow as well as the pose, shape and motion of vehicles in the scene. Towards this goal, we incorporate a deformable CAD model into a slanted-plane conditional random field for scene flow estimation and enforce shape consistency between the rendered 3D models and the parameters of all superpixels in the image. The association of superpixels to objects is established by an index variable which implicitly enables model selection. We evaluate our approach on the challenging KITTI scene flow dataset in terms of object and scene flow estimation. Our results provide a prove of concept and demonstrate the usefulness of our method.

PDF [BibTex]

PDF [BibTex]


Thumb xl teaser
A Setup for multi-UAV hardware-in-the-loop simulations

Odelga, M., Stegagno, P., Bülthoff, H., Ahmad, A.

In pages: 204-210, IEEE, 2015 (inproceedings)

Abstract
In this paper, we present a hardware in the loop simulation setup for multi-UAV systems. With our setup, we are able to command the robots simulated in Gazebo, a popular open source ROS-enabled physical simulator, using the computational units that are embedded on our quadrotor UAVs. Hence, we can test in simulation not only the correct execution of algorithms, but also the computational feasibility directly on the robot hardware. In addition, since our setup is inherently multi-robot, we can also test the communication flow among the robots. We provide two use cases to show the characteristics of our setup.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Thumb xl subimage
Smooth Loops from Unconstrained Video

Sevilla-Lara, L., Wulff, J., Sunkavalli, K., Shechtman, E.

In Computer Graphics Forum (Proceedings of EGSR), 34(4):99-107, 2015 (inproceedings)

Abstract
Converting unconstrained video sequences into videos that loop seamlessly is an extremely challenging problem. In this work, we take the first steps towards automating this process by focusing on an important subclass of videos containing a single dominant foreground object. Our technique makes two novel contributions over previous work: first, we propose a correspondence-based similarity metric to automatically identify a good transition point in the video where the appearance and dynamics of the foreground are most consistent. Second, we develop a technique that aligns both the foreground and background about this transition point using a combination of global camera path planning and patch-based video morphing. We demonstrate that this allows us to create natural, compelling, loopy videos from a wide range of videos collected from the internet.

pdf link (url) DOI Project Page [BibTex]

pdf link (url) DOI Project Page [BibTex]


Thumb xl result overlayed
Onboard robust person detection and tracking for domestic service robots

Sanz, D., Ahmad, A., Lima, P.

In Robot 2015: Second Iberian Robotics Conference, pages: 547-559, Advances in Intelligent Systems and Computing ; 418, Springer, Cham, Switzerland, 2015 (inproceedings)

Abstract
Domestic assistance for the elderly and impaired people is one of the biggest upcoming challenges of our society. Consequently, in-home care through domestic service robots is identified as one of the most important application area of robotics research. Assistive tasks may range from visitor reception at the door to catering for owner's small daily necessities within a house. Since most of these tasks require the robot to interact directly with humans, a predominant robot functionality is to detect and track humans in real time: either the owner of the robot or visitors at home or both. In this article we present a robust method for such a functionality that combines depth-based segmentation and visual detection. The robustness of our method lies in its capability to not only identify partially occluded humans (e.g., with only torso visible) but also to do so in varying lighting conditions. We thoroughly validate our method through extensive experiments on real robot datasets and comparisons with the ground truth. The datasets were collected on a home-like environment set up within the context of RoboCup@Home and RoCKIn@Home competitions.

DOI [BibTex]

DOI [BibTex]

2014


Thumb xl thumb grouped teaser
Hough-based Object Detection with Grouped Features

Srikantha, A., Gall, J.

International Conference on Image Processing, pages: 1653-1657, Paris, France, October 2014 (conference)

Abstract
Hough-based voting approaches have been successfully applied to object detection. While these methods can be efficiently implemented by random forests, they estimate the probability for an object hypothesis for each feature independently. In this work, we address this problem by grouping features in a local neighborhood to obtain a better estimate of the probability. To this end, we propose oblique classification-regression forests that combine features of different trees. We further investigate the benefit of combining independent and grouped features and evaluate the approach on RGB and RGB-D datasets.

pdf poster DOI Project Page [BibTex]

2014

pdf poster DOI Project Page [BibTex]


Thumb xl thumb schoenbein2014iros
Omnidirectional 3D Reconstruction in Augmented Manhattan Worlds

Schoenbein, M., Geiger, A.

International Conference on Intelligent Robots and Systems, pages: 716 - 723, IEEE, Chicago, IL, USA, October 2014 (conference)

Abstract
This paper proposes a method for high-quality omnidirectional 3D reconstruction of augmented Manhattan worlds from catadioptric stereo video sequences. In contrast to existing works we do not rely on constructing virtual perspective views, but instead propose to optimize depth jointly in a unified omnidirectional space. Furthermore, we show that plane-based prior models can be applied even though planes in 3D do not project to planes in the omnidirectional domain. Towards this goal, we propose an omnidirectional slanted-plane Markov random field model which relies on plane hypotheses extracted using a novel voting scheme for 3D planes in omnidirectional space. To quantitatively evaluate our method we introduce a dataset which we have captured using our autonomous driving platform AnnieWAY which we equipped with two horizontally aligned catadioptric cameras and a Velodyne HDL-64E laser scanner for precise ground truth depth measurements. As evidenced by our experiments, the proposed method clearly benefits from the unified view and significantly outperforms existing stereo matching techniques both quantitatively and qualitatively. Furthermore, our method is able to reduce noise and the obtained depth maps can be represented very compactly by a small number of image segments and plane parameters.

pdf DOI [BibTex]

pdf DOI [BibTex]


Thumb xl eccv14
Image-based 4-d Reconstruction Using 3-d Change Detection

Ulusoy, A. O., Mundy, J. L.

In Computer Vision – ECCV 2014, pages: 31-45, Lecture Notes in Computer Science, (Editors: D. Fleet and T. Pajdla and B. Schiele and T. Tuytelaars ), Springer International Publishing, September 2014 (inproceedings)

Abstract
This paper describes an approach to reconstruct the complete history of a 3-d scene over time from imagery. The proposed approach avoids rebuilding 3-d models of the scene at each time instant. Instead, the approach employs an initial 3-d model which is continuously updated with changes in the environment to form a full 4-d representation. This updating scheme is enabled by a novel algorithm that infers 3-d changes with respect to the model at one time step from images taken at a subsequent time step. This algorithm can effectively detect changes even when the illumination conditions between image collections are significantly different. The performance of the proposed framework is demonstrated on four challenging datasets in terms of 4-d modeling accuracy as well as quantitative evaluation of 3-d change detection.

video pdf supplementary DOI [BibTex]

video pdf supplementary DOI [BibTex]


Thumb xl fop
Human Pose Estimation with Fields of Parts

Kiefel, M., Gehler, P.

In Computer Vision – ECCV 2014, LNCS 8693, pages: 331-346, Lecture Notes in Computer Science, (Editors: Fleet, David and Pajdla, Tomas and Schiele, Bernt and Tuytelaars, Tinne), Springer, September 2014 (inproceedings)

Abstract
This paper proposes a new formulation of the human pose estimation problem. We present the Fields of Parts model, a binary Conditional Random Field model designed to detect human body parts of articulated people in single images. The Fields of Parts model is inspired by the idea of Pictorial Structures, it models local appearance and joint spatial configuration of the human body. However the underlying graph structure is entirely different. The idea is simple: we model the presence and absence of a body part at every possible position, orientation, and scale in an image with a binary random variable. This results into a vast number of random variables, however, we show that approximate inference in this model is efficient. Moreover we can encode the very same appearance and spatial structure as in Pictorial Structures models. This approach allows us to combine ideas from segmentation and pose estimation into a single model. The Fields of Parts model can use evidence from the background, include local color information, and it is connected more densely than a kinematic chain structure. On the challenging Leeds Sports Poses dataset we improve over the Pictorial Structures counterpart by 5.5% in terms of Average Precision of Keypoints (APK).

website pdf DOI Project Page [BibTex]

website pdf DOI Project Page [BibTex]


Thumb xl thumb thumb2
Capturing Hand Motion with an RGB-D Sensor, Fusing a Generative Model with Salient Points

Tzionas, D., Srikantha, A., Aponte, P., Gall, J.

In German Conference on Pattern Recognition (GCPR), pages: 1-13, Lecture Notes in Computer Science, Springer, September 2014 (inproceedings)

Abstract
Hand motion capture has been an active research topic in recent years, following the success of full-body pose tracking. Despite similarities, hand tracking proves to be more challenging, characterized by a higher dimensionality, severe occlusions and self-similarity between fingers. For this reason, most approaches rely on strong assumptions, like hands in isolation or expensive multi-camera systems, that limit the practical use. In this work, we propose a framework for hand tracking that can capture the motion of two interacting hands using only a single, inexpensive RGB-D camera. Our approach combines a generative model with collision detection and discriminatively learned salient points. We quantitatively evaluate our approach on 14 new sequences with challenging interactions.

pdf Supplementary pdf Supplementary Material Project Page DOI Project Page [BibTex]

pdf Supplementary pdf Supplementary Material Project Page DOI Project Page [BibTex]


Thumb xl opendr
OpenDR: An Approximate Differentiable Renderer

Loper, M. M., Black, M. J.

In Computer Vision – ECCV 2014, 8695, pages: 154-169, Lecture Notes in Computer Science, (Editors: D. Fleet and T. Pajdla and B. Schiele and T. Tuytelaars ), Springer International Publishing, September 2014 (inproceedings)

Abstract
Inverse graphics attempts to take sensor data and infer 3D geometry, illumination, materials, and motions such that a graphics renderer could realistically reproduce the observed scene. Renderers, however, are designed to solve the forward process of image synthesis. To go in the other direction, we propose an approximate di fferentiable renderer (DR) that explicitly models the relationship between changes in model parameters and image observations. We describe a publicly available OpenDR framework that makes it easy to express a forward graphics model and then automatically obtain derivatives with respect to the model parameters and to optimize over them. Built on a new autodiff erentiation package and OpenGL, OpenDR provides a local optimization method that can be incorporated into probabilistic programming frameworks. We demonstrate the power and simplicity of programming with OpenDR by using it to solve the problem of estimating human body shape from Kinect depth and RGB data.

pdf Code Chumpy Supplementary video of talk DOI Project Page [BibTex]

pdf Code Chumpy Supplementary video of talk DOI Project Page [BibTex]


Thumb xl teaser 200 10
Discovering Object Classes from Activities

Srikantha, A., Gall, J.

In European Conference on Computer Vision, 8694, pages: 415-430, Lecture Notes in Computer Science, (Editors: D. Fleet and T. Pajdla and B. Schiele and T. Tuytelaars ), Springer International Publishing, September 2014 (inproceedings)

Abstract
In order to avoid an expensive manual labeling process or to learn object classes autonomously without human intervention, object discovery techniques have been proposed that extract visual similar objects from weakly labelled videos. However, the problem of discovering small or medium sized objects is largely unexplored. We observe that videos with activities involving human-object interactions can serve as weakly labelled data for such cases. Since neither object appearance nor motion is distinct enough to discover objects in these videos, we propose a framework that samples from a space of algorithms and their parameters to extract sequences of object proposals. Furthermore, we model similarity of objects based on appearance and functionality, which is derived from human and object motion. We show that functionality is an important cue for discovering objects from activities and demonstrate the generality of the model on three challenging RGB-D and RGB datasets.

pdf anno poster DOI Project Page [BibTex]

pdf anno poster DOI Project Page [BibTex]


Thumb xl ps page panel
Probabilistic Progress Bars

Kiefel, M., Schuler, C., Hennig, P.

In Conference on Pattern Recognition (GCPR), 8753, pages: 331-341, Lecture Notes in Computer Science, (Editors: Jiang, X., Hornegger, J., and Koch, R.), Springer, September 2014 (inproceedings)

Abstract
Predicting the time at which the integral over a stochastic process reaches a target level is a value of interest in many applications. Often, such computations have to be made at low cost, in real time. As an intuitive example that captures many features of this problem class, we choose progress bars, a ubiquitous element of computer user interfaces. These predictors are usually based on simple point estimators, with no error modelling. This leads to fluctuating behaviour confusing to the user. It also does not provide a distribution prediction (risk values), which are crucial for many other application areas. We construct and empirically evaluate a fast, constant cost algorithm using a Gauss-Markov process model which provides more information to the user.

website+code pdf DOI [BibTex]

website+code pdf DOI [BibTex]


Thumb xl new teaser aligned
Optical Flow Estimation with Channel Constancy

Sevilla-Lara, L., Sun, D., Learned-Miller, E. G., Black, M. J.

In Computer Vision – ECCV 2014, 8689, pages: 423-438, Lecture Notes in Computer Science, (Editors: D. Fleet and T. Pajdla and B. Schiele and T. Tuytelaars ), Springer International Publishing, September 2014 (inproceedings)

Abstract
Large motions remain a challenge for current optical flow algorithms. Traditionally, large motions are addressed using multi-resolution representations like Gaussian pyramids. To deal with large displacements, many pyramid levels are needed and, if an object is small, it may be invisible at the highest levels. To address this we decompose images using a channel representation (CR) and replace the standard brightness constancy assumption with a descriptor constancy assumption. CRs can be seen as an over-segmentation of the scene into layers based on some image feature. If the appearance of a foreground object differs from the background then its descriptor will be different and they will be represented in different layers.We create a pyramid by smoothing these layers, without mixing foreground and background or losing small objects. Our method estimates more accurate flow than the baseline on the MPI-Sintel benchmark, especially for fast motions and near motion boundaries.

pdf DOI [BibTex]

pdf DOI [BibTex]


Thumb xl blurreccv
Modeling Blurred Video with Layers

Wulff, J., Black, M. J.

In Computer Vision – ECCV 2014, 8694, pages: 236-252, Lecture Notes in Computer Science, (Editors: D. Fleet and T. Pajdla and B. Schiele and T. Tuytelaars ), Springer International Publishing, September 2014 (inproceedings)

Abstract
Videos contain complex spatially-varying motion blur due to the combination of object motion, camera motion, and depth variation with fi nite shutter speeds. Existing methods to estimate optical flow, deblur the images, and segment the scene fail in such cases. In particular, boundaries between di fferently moving objects cause problems, because here the blurred images are a combination of the blurred appearances of multiple surfaces. We address this with a novel layered model of scenes in motion. From a motion-blurred video sequence, we jointly estimate the layer segmentation and each layer's appearance and motion. Since the blur is a function of the layer motion and segmentation, it is completely determined by our generative model. Given a video, we formulate the optimization problem as minimizing the pixel error between the blurred frames and images synthesized from the model, and solve it using gradient descent. We demonstrate our approach on synthetic and real sequences.

pdf Supplemental Video Data DOI Project Page Project Page [BibTex]

pdf Supplemental Video Data DOI Project Page Project Page [BibTex]


Thumb xl teaser
Intrinsic Video

Kong, N., Gehler, P. V., Black, M. J.

In Computer Vision – ECCV 2014, 8690, pages: 360-375, Lecture Notes in Computer Science, (Editors: D. Fleet and T. Pajdla and B. Schiele and T. Tuytelaars ), Springer International Publishing, September 2014 (inproceedings)

Abstract
Intrinsic images such as albedo and shading are valuable for later stages of visual processing. Previous methods for extracting albedo and shading use either single images or images together with depth data. Instead, we define intrinsic video estimation as the problem of extracting temporally coherent albedo and shading from video alone. Our approach exploits the assumption that albedo is constant over time while shading changes slowly. Optical flow aids in the accurate estimation of intrinsic video by providing temporal continuity as well as putative surface boundaries. Additionally, we find that the estimated albedo sequence can be used to improve optical flow accuracy in sequences with changing illumination. The approach makes only weak assumptions about the scene and we show that it substantially outperforms existing single-frame intrinsic image methods. We evaluate this quantitatively on synthetic sequences as well on challenging natural sequences with complex geometry, motion, and illumination.

pdf Supplementary Video DOI Project Page Project Page [BibTex]

pdf Supplementary Video DOI Project Page Project Page [BibTex]


Thumb xl miccai
Automated Detection of New or Evolving Melanocytic Lesions Using a 3D Body Model

Bogo, F., Romero, J., Peserico, E., Black, M. J.

In Medical Image Computing and Computer-Assisted Intervention (MICCAI), 8673, pages: 593-600, Lecture Notes in Computer Science, (Editors: Golland, Polina and Hata, Nobuhiko and Barillot, Christian and Hornegger, Joachim and Howe, Robert), Spring International Publishing, September 2014 (inproceedings)

Abstract
Detection of new or rapidly evolving melanocytic lesions is crucial for early diagnosis and treatment of melanoma.We propose a fully automated pre-screening system for detecting new lesions or changes in existing ones, on the order of 2 - 3mm, over almost the entire body surface. Our solution is based on a multi-camera 3D stereo system. The system captures 3D textured scans of a subject at diff erent times and then brings these scans into correspondence by aligning them with a learned, parametric, non-rigid 3D body model. This means that captured skin textures are in accurate alignment across scans, facilitating the detection of new or changing lesions. The integration of lesion segmentation with a deformable 3D body model is a key contribution that makes our approach robust to changes in illumination and subject pose.

pdf Poster DOI Project Page [BibTex]

pdf Poster DOI Project Page [BibTex]


Thumb xl hongwmpt eccv2014
Tracking using Multilevel Quantizations

Hong, Z., Wang, C., Mei, X., Prokhorov, D., Tao, D.

In Computer Vision – ECCV 2014, 8694, pages: 155-171, Lecture Notes in Computer Science, (Editors: D. Fleet and T. Pajdla and B. Schiele and T. Tuytelaars ), Springer International Publishing, September 2014 (inproceedings)

Abstract
Most object tracking methods only exploit a single quantization of an image space: pixels, superpixels, or bounding boxes, each of which has advantages and disadvantages. It is highly unlikely that a common optimal quantization level, suitable for tracking all objects in all environments, exists. We therefore propose a hierarchical appearance representation model for tracking, based on a graphical model that exploits shared information across multiple quantization levels. The tracker aims to find the most possible position of the target by jointly classifying the pixels and superpixels and obtaining the best configuration across all levels. The motion of the bounding box is taken into consideration, while Online Random Forests are used to provide pixel- and superpixel-level quantizations and progressively updated on-the-fly. By appropriately considering the multilevel quantizations, our tracker exhibits not only excellent performance in non-rigid object deformation handling, but also its robustness to occlusions. A quantitative evaluation is conducted on two benchmark datasets: a non-rigid object tracking dataset (11 sequences) and the CVPR2013 tracking benchmark (50 sequences). Experimental results show that our tracker overcomes various tracking challenges and is superior to a number of other popular tracking methods.

pdf DOI [BibTex]

pdf DOI [BibTex]


no image
The RoCKIn@Home User Story

Schneider, S., Hegger, F., Kraetzschmar, G., Amigoni, F., Berghofer, J., Bischoff, R., Bonarini, A., Dwiputra, R., Iocchi, L., Lima, P., Matteucci, M., Nardi, D., Awaad, I., Ahmad, A., Fontana, G., Hochgeschwender, N., Schiaffonati, V.

June 2014 (conference)

[BibTex]

[BibTex]


no image
Overview on the RoCKIn@Work Challenge

Dwiputra, R., Berghofer, J., Amigoni, F., Bischoff, R., Bonarini, A., Iocchi, L., Kraetzschmar, G., Lima, P., Matteucci, M., Nardi, D., Ahmad, A., Awaad, I., Fontana, G., Hegger, F., Hochgeschwender, N., Schiaffonati, V., Schneider, S.

June 2014 (conference)

[BibTex]

[BibTex]


Thumb xl thumb thumb
Human Pose Estimation: New Benchmark and State of the Art Analysis

Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 3686 - 3693, IEEE, June 2014 (inproceedings)

pdf DOI Project Page Project Page Project Page [BibTex]

pdf DOI Project Page Project Page Project Page [BibTex]


Thumb xl faust
FAUST: Dataset and evaluation for 3D mesh registration

(Dataset Award, Eurographics Symposium on Geometry Processing (SGP), 2016)

Bogo, F., Romero, J., Loper, M., Black, M. J.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 3794 -3801, Columbus, Ohio, USA, June 2014 (inproceedings)

Abstract
New scanning technologies are increasing the importance of 3D mesh data and the need for algorithms that can reliably align it. Surface registration is important for building full 3D models from partial scans, creating statistical shape models, shape retrieval, and tracking. The problem is particularly challenging for non-rigid and articulated objects like human bodies. While the challenges of real-world data registration are not present in existing synthetic datasets, establishing ground-truth correspondences for real 3D scans is difficult. We address this with a novel mesh registration technique that combines 3D shape and appearance information to produce high-quality alignments. We define a new dataset called FAUST that contains 300 scans of 10 people in a wide range of poses together with an evaluation methodology. To achieve accurate registration, we paint the subjects with high-frequency textures and use an extensive validation process to ensure accurate ground truth. We find that current shape registration methods have trouble with this real-world data. The dataset and evaluation website are available for research purposes at http://faust.is.tue.mpg.de.

pdf Video Dataset Poster Talk DOI Project Page Project Page Project Page [BibTex]

pdf Video Dataset Poster Talk DOI Project Page Project Page Project Page [BibTex]


Thumb xl modeltransport
Model Transport: Towards Scalable Transfer Learning on Manifolds

Freifeld, O., Hauberg, S., Black, M. J.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 1378 -1385, Columbus, Ohio, USA, June 2014 (inproceedings)

Abstract
We consider the intersection of two research fields: transfer learning and statistics on manifolds. In particular, we consider, for manifold-valued data, transfer learning of tangent-space models such as Gaussians distributions, PCA, regression, or classifiers. Though one would hope to simply use ordinary Rn-transfer learning ideas, the manifold structure prevents it. We overcome this by basing our method on inner-product-preserving parallel transport, a well-known tool widely used in other problems of statistics on manifolds in computer vision. At first, this straightforward idea seems to suffer from an obvious shortcoming: Transporting large datasets is prohibitively expensive, hindering scalability. Fortunately, with our approach, we never transport data. Rather, we show how the statistical models themselves can be transported, and prove that for the tangent-space models above, the transport “commutes” with learning. Consequently, our compact framework, applicable to a large class of manifolds, is not restricted by the size of either the training or test sets. We demonstrate the approach by transferring PCA and logistic-regression models of real-world data involving 3D shapes and image descriptors.

pdf SupMat Video poster DOI Project Page [BibTex]

pdf SupMat Video poster DOI Project Page [BibTex]


Thumb xl screen shot 2014 07 09 at 15.49.27
Robot Arm Pose Estimation through Pixel-Wise Part Classification

Bohg, J., Romero, J., Herzog, A., Schaal, S.

In IEEE International Conference on Robotics and Automation (ICRA) 2014, pages: 3143-3150, June 2014 (inproceedings)

Abstract
We propose to frame the problem of marker-less robot arm pose estimation as a pixel-wise part classification problem. As input, we use a depth image in which each pixel is classified to be either from a particular robot part or the background. The classifier is a random decision forest trained on a large number of synthetically generated and labeled depth images. From all the training samples ending up at a leaf node, a set of offsets is learned that votes for relative joint positions. Pooling these votes over all foreground pixels and subsequent clustering gives us an estimate of the true joint positions. Due to the intrinsic parallelism of pixel-wise classification, this approach can run in super real-time and is more efficient than previous ICP-like methods. We quantitatively evaluate the accuracy of this approach on synthetic data. We also demonstrate that the method produces accurate joint estimates on real data despite being purely trained on synthetic data.

video code pdf DOI Project Page [BibTex]

video code pdf DOI Project Page [BibTex]


Thumb xl dfm
Efficient Non-linear Markov Models for Human Motion

Lehrmann, A. M., Gehler, P. V., Nowozin, S.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 1314-1321, IEEE, June 2014 (inproceedings)

Abstract
Dynamic Bayesian networks such as Hidden Markov Models (HMMs) are successfully used as probabilistic models for human motion. The use of hidden variables makes them expressive models, but inference is only approximate and requires procedures such as particle filters or Markov chain Monte Carlo methods. In this work we propose to instead use simple Markov models that only model observed quantities. We retain a highly expressive dynamic model by using interactions that are nonlinear and non-parametric. A presentation of our approach in terms of latent variables shows logarithmic growth for the computation of exact loglikelihoods in the number of latent states. We validate our model on human motion capture data and demonstrate state-of-the-art performance on action recognition and motion completion tasks.

Project page pdf DOI Project Page [BibTex]

Project page pdf DOI Project Page [BibTex]


Thumb xl grassmann
Grassmann Averages for Scalable Robust PCA

Hauberg, S., Feragen, A., Black, M. J.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 3810 -3817, Columbus, Ohio, USA, June 2014 (inproceedings)

Abstract
As the collection of large datasets becomes increasingly automated, the occurrence of outliers will increase – "big data" implies "big outliers". While principal component analysis (PCA) is often used to reduce the size of data, and scalable solutions exist, it is well-known that outliers can arbitrarily corrupt the results. Unfortunately, state-of-the-art approaches for robust PCA do not scale beyond small-to-medium sized datasets. To address this, we introduce the Grassmann Average (GA), which expresses dimensionality reduction as an average of the subspaces spanned by the data. Because averages can be efficiently computed, we immediately gain scalability. GA is inherently more robust than PCA, but we show that they coincide for Gaussian data. We exploit that averages can be made robust to formulate the Robust Grassmann Average (RGA) as a form of robust PCA. Robustness can be with respect to vectors (subspaces) or elements of vectors; we focus on the latter and use a trimmed average. The resulting Trimmed Grassmann Average (TGA) is particularly appropriate for computer vision because it is robust to pixel outliers. The algorithm has low computational complexity and minimal memory requirements, making it scalable to "big noisy data." We demonstrate TGA for background modeling, video restoration, and shadow removal. We show scalability by performing robust PCA on the entire Star Wars IV movie.

pdf code supplementary material tutorial video results video talk poster DOI Project Page [BibTex]

pdf code supplementary material tutorial video results video talk poster DOI Project Page [BibTex]


Thumb xl 3basic posebits
Posebits for Monocular Human Pose Estimation

Pons-Moll, G., Fleet, D. J., Rosenhahn, B.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 2345-2352, Columbus, Ohio, USA, June 2014 (inproceedings)

Abstract
We advocate the inference of qualitative information about 3D human pose, called posebits, from images. Posebits represent boolean geometric relationships between body parts (e.g., left-leg in front of right-leg or hands close to each other). The advantages of posebits as a mid-level representation are 1) for many tasks of interest, such qualitative pose information may be sufficient (e.g. , semantic image retrieval), 2) it is relatively easy to annotate large image corpora with posebits, as it simply requires answers to yes/no questions; and 3) they help resolve challenging pose ambiguities and therefore facilitate the difficult talk of image-based 3D pose estimation. We introduce posebits, a posebit database, a method for selecting useful posebits for pose estimation and a structural SVM model for posebit inference. Experiments show the use of posebits for semantic image retrieval and for improving 3D pose estimation.

pdf Project Page Project Page [BibTex]

pdf Project Page Project Page [BibTex]


Thumb xl roser
Simultaneous Underwater Visibility Assessment, Enhancement and Improved Stereo

Roser, M., Dunbabin, M., Geiger, A.

IEEE International Conference on Robotics and Automation, pages: 3840 - 3847 , Hong Kong, China, June 2014 (conference)

Abstract
Vision-based underwater navigation and obstacle avoidance demands robust computer vision algorithms, particularly for operation in turbid water with reduced visibility. This paper describes a novel method for the simultaneous underwater image quality assessment, visibility enhancement and disparity computation to increase stereo range resolution under dynamic, natural lighting and turbid conditions. The technique estimates the visibility properties from a sparse 3D map of the original degraded image using a physical underwater light attenuation model. Firstly, an iterated distance-adaptive image contrast enhancement enables a dense disparity computation and visibility estimation. Secondly, using a light attenuation model for ocean water, a color corrected stereo underwater image is obtained along with a visibility distance estimate. Experimental results in shallow, naturally lit, high-turbidity coastal environments show the proposed technique improves range estimation over the original images as well as image quality and color for habitat classification. Furthermore, the recursiveness and robustness of the technique allows real-time implementation onboard an Autonomous Underwater Vehicles for improved navigation and obstacle avoidance performance.

pdf DOI [BibTex]

pdf DOI [BibTex]


Thumb xl icmlteaser
Preserving Modes and Messages via Diverse Particle Selection

Pacheco, J., Zuffi, S., Black, M. J., Sudderth, E.

In Proceedings of the 31st International Conference on Machine Learning (ICML-14), 32(1):1152-1160, J. Machine Learning Research Workshop and Conf. and Proc., Beijing, China, June 2014 (inproceedings)

Abstract
In applications of graphical models arising in domains such as computer vision and signal processing, we often seek the most likely configurations of high-dimensional, continuous variables. We develop a particle-based max-product algorithm which maintains a diverse set of posterior mode hypotheses, and is robust to initialization. At each iteration, the set of hypotheses at each node is augmented via stochastic proposals, and then reduced via an efficient selection algorithm. The integer program underlying our optimization-based particle selection minimizes errors in subsequent max-product message updates. This objective automatically encourages diversity in the maintained hypotheses, without requiring tuning of application-specific distances among hypotheses. By avoiding the stochastic resampling steps underlying particle sum-product algorithms, we also avoid common degeneracies where particles collapse onto a single hypothesis. Our approach significantly outperforms previous particle-based algorithms in experiments focusing on the estimation of human pose from single images.

pdf SupMat link (url) Project Page Project Page [BibTex]

pdf SupMat link (url) Project Page Project Page [BibTex]


Thumb xl schoenbein
Calibrating and Centering Quasi-Central Catadioptric Cameras

Schoenbein, M., Strauss, T., Geiger, A.

IEEE International Conference on Robotics and Automation, pages: 4443 - 4450, Hong Kong, China, June 2014 (conference)

Abstract
Non-central catadioptric models are able to cope with irregular camera setups and inaccuracies in the manufacturing process but are computationally demanding and thus not suitable for robotic applications. On the other hand, calibrating a quasi-central (almost central) system with a central model introduces errors due to a wrong relationship between the viewing ray orientations and the pixels on the image sensor. In this paper, we propose a central approximation to quasi-central catadioptric camera systems that is both accurate and efficient. We observe that the distance to points in 3D is typically large compared to deviations from the single viewpoint. Thus, we first calibrate the system using a state-of-the-art non-central camera model. Next, we show that by remapping the observations we are able to match the orientation of the viewing rays of a much simpler single viewpoint model with the true ray orientations. While our approximation is general and applicable to all quasi-central camera systems, we focus on one of the most common cases in practice: hypercatadioptric cameras. We compare our model to a variety of baselines in synthetic and real localization and motion estimation experiments. We show that by using the proposed model we are able to achieve near non-central accuracy while obtaining speed-ups of more than three orders of magnitude compared to state-of-the-art non-central models.

pdf DOI [BibTex]

pdf DOI [BibTex]


Thumb xl aistats2014
Probabilistic Solutions to Differential Equations and their Application to Riemannian Statistics

Hennig, P., Hauberg, S.

In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, 33, pages: 347-355, JMLR: Workshop and Conference Proceedings, (Editors: S Kaski and J Corander), Microtome Publishing, Brookline, MA, April 2014 (inproceedings)

Abstract
We study a probabilistic numerical method for the solution of both boundary and initial value problems that returns a joint Gaussian process posterior over the solution. Such methods have concrete value in the statistics on Riemannian manifolds, where non-analytic ordinary differential equations are involved in virtually all computations. The probabilistic formulation permits marginalising the uncertainty of the numerical solution such that statistics are less sensitive to inaccuracies. This leads to new Riemannian algorithms for mean value computations and principal geodesic analysis. Marginalisation also means results can be less precise than point estimates, enabling a noticeable speed-up over the state of the art. Our approach is an argument for a wider point that uncertainty caused by numerical calculations should be tracked throughout the pipeline of machine learning algorithms.

pdf Youtube Supplements Project page link (url) [BibTex]

pdf Youtube Supplements Project page link (url) [BibTex]


Thumb xl thumb
Multi-View Priors for Learning Detectors from Sparse Viewpoint Data

Pepik, B., Stark, M., Gehler, P., Schiele, B.

International Conference on Learning Representations, April 2014 (conference)

Abstract
While the majority of today's object class models provide only 2D bounding boxes, far richer output hypotheses are desirable including viewpoint, fine-grained category, and 3D geometry estimate. However, models trained to provide richer output require larger amounts of training data, preferably well covering the relevant aspects such as viewpoint and fine-grained categories. In this paper, we address this issue from the perspective of transfer learning, and design an object class model that explicitly leverages correlations between visual features. Specifically, our model represents prior distributions over permissible multi-view detectors in a parametric way -- the priors are learned once from training data of a source object class, and can later be used to facilitate the learning of a detector for a target class. As we show in our experiments, this transfer is not only beneficial for detectors based on basic-level category representations, but also enables the robust learning of detectors that represent classes at finer levels of granularity, where training data is typically even scarcer and more unbalanced. As a result, we report largely improved performance in simultaneous 2D object localization and viewpoint estimation on a recent dataset of challenging street scenes.

reviews pdf Project Page [BibTex]

reviews pdf Project Page [BibTex]


Thumb xl figure1
NRSfM using Local Rigidity

Rehan, A., Zaheer, A., Akhter, I., Saeed, A., Mahmood, B., Usmani, M., Khan, S.

In Proceedings Winter Conference on Applications of Computer Vision, pages: 69-74, open access, IEEE , Steamboat Springs, CO, USA, March 2014 (inproceedings)

Abstract
Factorization methods for computation of nonrigid structure have limited practicality, and work well only when there is large enough camera motion between frames, with long sequences and limited or no occlusions. We show that typical nonrigid structure can often be approximated well as locally rigid sub-structures in time and space. Specifically, we assume that: 1) the structure can be approximated as rigid in a short local time window and 2) some point pairs stay relatively rigid in space, maintaining a fixed distance between them during the sequence. We first use the triangulation constraints in rigid SFM over a sliding time window to get an initial estimate of the nonrigid 3D structure. We then automatically identify relatively rigid point pairs in this structure, and use their length-constancy simultaneously with triangulation constraints to refine the structure estimate. Unlike factorization methods, the structure is estimated independent of the camera motion computation, adding to the simplicity and stability of the approach. Further, local factorization inherently handles significant natural occlusions gracefully, performing much better than the state-of-the art. We show more stable and accurate results as compared to the state-of-the art on even short sequences starting from 15 frames only, containing camera rotations as small as 2 degree and up to 50% missing data.

link (url) [BibTex]

link (url) [BibTex]


Thumb xl aggteaser
Model-based Anthropometry: Predicting Measurements from 3D Human Scans in Multiple Poses

Tsoli, A., Loper, M., Black, M. J.

In Proceedings Winter Conference on Applications of Computer Vision, pages: 83-90, IEEE , March 2014 (inproceedings)

Abstract
Extracting anthropometric or tailoring measurements from 3D human body scans is important for applications such as virtual try-on, custom clothing, and online sizing. Existing commercial solutions identify anatomical landmarks on high-resolution 3D scans and then compute distances or circumferences on the scan. Landmark detection is sensitive to acquisition noise (e.g. holes) and these methods require subjects to adopt a specific pose. In contrast, we propose a solution we call model-based anthropometry. We fit a deformable 3D body model to scan data in one or more poses; this model-based fitting is robust to scan noise. This brings the scan into registration with a database of registered body scans. Then, we extract features from the registered model (rather than from the scan); these include, limb lengths, circumferences, and statistical features of global shape. Finally, we learn a mapping from these features to measurements using regularized linear regression. We perform an extensive evaluation using the CAESAR dataset and demonstrate that the accuracy of our method outperforms state-of-the-art methods.

pdf DOI Project Page Project Page [BibTex]

pdf DOI Project Page Project Page [BibTex]


Thumb xl isprs2014
Evaluation of feature-based 3-d registration of probabilistic volumetric scenes

Restrepo, M. I., Ulusoy, A. O., Mundy, J. L.

In ISPRS Journal of Photogrammetry and Remote Sensing, 98(0):1-18, 2014 (inproceedings)

Abstract
Automatic estimation of the world surfaces from aerial images has seen much attention and progress in recent years. Among current modeling technologies, probabilistic volumetric models (PVMs) have evolved as an alternative representation that can learn geometry and appearance in a dense and probabilistic manner. Recent progress, in terms of storage and speed, achieved in the area of volumetric modeling, opens the opportunity to develop new frameworks that make use of the {PVM} to pursue the ultimate goal of creating an entire map of the earth, where one can reason about the semantics and dynamics of the 3-d world. Aligning 3-d models collected at different time-instances constitutes an important step for successful fusion of large spatio-temporal information. This paper evaluates how effectively probabilistic volumetric models can be aligned using robust feature-matching techniques, while considering different scenarios that reflect the kind of variability observed across aerial video collections from different time instances. More precisely, this work investigates variability in terms of discretization, resolution and sampling density, errors in the camera orientation, and changes in illumination and geographic characteristics. All results are given for large-scale, outdoor sites. In order to facilitate the comparison of the registration performance of {PVMs} to that of other 3-d reconstruction techniques, the registration pipeline is also carried out using Patch-based Multi-View Stereo (PMVS) algorithm. Registration performance is similar for scenes that have favorable geometry and the appearance characteristics necessary for high quality reconstruction. In scenes containing trees, such as a park, or many buildings, such as a city center, registration performance is significantly more accurate when using the PVM.

Publisher site link (url) DOI [BibTex]

Publisher site link (url) DOI [BibTex]


no image
Left Ventricle Segmentation by Dynamic Shape Constrained Random Walk

X. Yang, Y. Su, M. Wan, S. Y. Yeo, C. Lim, S. T. Wong, L. Zhong, R. S. Tan

In Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2014 (inproceedings)

Abstract
Accurate and robust extraction of the left ventricle (LV) cavity is a key step for quantitative analysis of cardiac functions. In this study, we propose an improved LV cavity segmentation method that incorporates a dynamic shape constraint into the weighting function of the random walks algorithm. The method involves an iterative process that updates an intermediate result to the desired solution. The shape constraint restricts the solution space of the segmentation result, such that the robustness of the algorithm is increased to handle misleading information that emanates from noise, weak boundaries, and clutter. Our experiments on real cardiac magnetic resonance images demonstrate that the proposed method obtains better segmentation performance than standard method.

[BibTex]

[BibTex]

2011


Thumb xl teaser iccv2011
Outdoor Human Motion Capture using Inverse Kinematics and von Mises-Fisher Sampling

Pons-Moll, G., Baak, A., Gall, J., Leal-Taixe, L., Mueller, M., Seidel, H., Rosenhahn, B.

In IEEE International Conference on Computer Vision (ICCV), pages: 1243-1250, November 2011 (inproceedings)

project page pdf supplemental [BibTex]

2011

project page pdf supplemental [BibTex]