Header logo is ps


2017


Thumb xl teasercrop
A Generative Model of People in Clothing

Lassner, C., Pons-Moll, G., Gehler, P. V.

In Proceedings IEEE International Conference on Computer Vision (ICCV), IEEE, Piscataway, NJ, USA, October 2017 (inproceedings)

Abstract
We present the first image-based generative model of people in clothing in a full-body setting. We sidestep the commonly used complex graphics rendering pipeline and the need for high-quality 3D scans of dressed people. Instead, we learn generative models from a large image database. The main challenge is to cope with the high variance in human pose, shape and appearance. For this reason, pure image-based approaches have not been considered so far. We show that this challenge can be overcome by splitting the generating process in two parts. First, we learn to generate a semantic segmentation of the body and clothing. Second, we learn a conditional model on the resulting segments that creates realistic images. The full model is differentiable and can be conditioned on pose, shape or color. The result are samples of people in different clothing items and styles. The proposed model can generate entirely new people with realistic clothing. In several experiments we present encouraging results that suggest an entirely data-driven approach to people generation is possible.

link (url) Project Page [BibTex]

2017

link (url) Project Page [BibTex]


Thumb xl website teaser
Semantic Video CNNs through Representation Warping

Gadde, R., Jampani, V., Gehler, P. V.

In Proceedings IEEE International Conference on Computer Vision (ICCV), IEEE, Piscataway, NJ, USA, October 2017 (inproceedings) Accepted

Abstract
In this work, we propose a technique to convert CNN models for semantic segmentation of static images into CNNs for video data. We describe a warping method that can be used to augment existing architectures with very lit- tle extra computational cost. This module is called Net- Warp and we demonstrate its use for a range of network architectures. The main design principle is to use optical flow of adjacent frames for warping internal network repre- sentations across time. A key insight of this work is that fast optical flow methods can be combined with many different CNN architectures for improved performance and end-to- end training. Experiments validate that the proposed ap- proach incurs only little extra computational cost, while im- proving performance, when video streams are available. We achieve new state-of-the-art results on the standard CamVid and Cityscapes benchmark datasets and show reliable im- provements over different baseline networks. Our code and models are available at http://segmentation.is. tue.mpg.de

pdf Supplementary Project Page [BibTex]

pdf Supplementary Project Page [BibTex]


Thumb xl screen shot 2017 08 09 at 12.54.00
A simple yet effective baseline for 3d human pose estimation

Martinez, J., Hossain, R., Romero, J., Little, J. J.

In Proceedings IEEE International Conference on Computer Vision (ICCV), IEEE, Piscataway, NJ, USA, October 2017 (inproceedings)

Abstract
Following the success of deep convolutional networks, state-of-the-art methods for 3d human pose estimation have focused on deep end-to-end systems that predict 3d joint locations given raw image pixels. Despite their excellent performance, it is often not easy to understand whether their remaining error stems from a limited 2d pose (visual) understanding, or from a failure to map 2d poses into 3-dimensional positions. With the goal of understanding these sources of error, we set out to build a system that given 2d joint locations predicts 3d positions. Much to our surprise, we have found that, with current technology, "lifting" ground truth 2d joint locations to 3d space is a task that can be solved with a remarkably low error rate: a relatively simple deep feed-forward network outperforms the best reported result by about 30\% on Human3.6M, the largest publicly available 3d pose estimation benchmark. Furthermore, training our system on the output of an off-the-shelf state-of-the-art 2d detector (\ie, using images as input) yields state of the art results -- this includes an array of systems that have been trained end-to-end specifically for this task. Our results indicate that a large portion of the error of modern deep 3d pose estimation systems stems from their visual analysis, and suggests directions to further advance the state of the art in 3d human pose estimation.

video code arxiv pdf preprint Project Page [BibTex]

video code arxiv pdf preprint Project Page [BibTex]


Thumb xl provisional
Parameterized Model of 2D Articulated Human Shape

Black, M. J., Freifeld, O., Weiss, A., Loper, M., Guan, P.

September 2017, U.S.~Patent 9,761,060 (misc)

Abstract
Disclosed are computer-readable devices, systems and methods for generating a model of a clothed body. The method includes generating a model of an unclothed human body, the model capturing a shape or a pose of the unclothed human body, determining two-dimensional contours associated with the model, and computing deformations by aligning a contour of a clothed human body with a contour of the unclothed human body. Based on the two-dimensional contours and the deformations, the method includes generating a first two-dimensional model of the unclothed human body, the first two-dimensional model factoring the deformations of the unclothed human body into one or more of a shape variation component, a viewpoint change, and a pose variation and learning an eigen-clothing model using principal component analysis applied to the deformations, wherein the eigen-clothing model classifies different types of clothing, to yield a second two-dimensional model of a clothed human body.

Google Patents [BibTex]


Thumb xl kenny
Effects of animation retargeting on perceived action outcomes

Kenny, S., Mahmood, N., Honda, C., Black, M. J., Troje, N. F.

Proceedings of the ACM Symposium on Applied Perception (SAP’17), pages: 2:1-2:7, September 2017 (conference)

Abstract
The individual shape of the human body, including the geometry of its articulated structure and the distribution of weight over that structure, influences the kinematics of a person's movements. How sensitive is the visual system to inconsistencies between shape and motion introduced by retargeting motion from one person onto the shape of another? We used optical motion capture to record five pairs of male performers with large differences in body weight, while they pushed, lifted, and threw objects. Based on a set of 67 markers, we estimated both the kinematics of the actions as well as the performer's individual body shape. To obtain consistent and inconsistent stimuli, we created animated avatars by combining the shape and motion estimates from either a single performer or from different performers. In a virtual reality environment, observers rated the perceived weight or thrown distance of the objects. They were also asked to explicitly discriminate between consistent and hybrid stimuli. Observers were unable to accomplish the latter, but hybridization of shape and motion influenced their judgements of action outcome in systematic ways. Inconsistencies between shape and motion were assimilated into an altered perception of the action outcome.

pdf DOI [BibTex]

pdf DOI [BibTex]


Thumb xl teaser
Coupling Adaptive Batch Sizes with Learning Rates

Balles, L., Romero, J., Hennig, P.

In Proceedings Conference on Uncertainty in Artificial Intelligence (UAI) 2017, pages: 410-419, (Editors: Gal Elidan and Kristian Kersting), Association for Uncertainty in Artificial Intelligence (AUAI), August 2017 (inproceedings)

Abstract
Mini-batch stochastic gradient descent and variants thereof have become standard for large-scale empirical risk minimization like the training of neural networks. These methods are usually used with a constant batch size chosen by simple empirical inspection. The batch size significantly influences the behavior of the stochastic optimization algorithm, though, since it determines the variance of the gradient estimates. This variance also changes over the optimization process; when using a constant batch size, stability and convergence is thus often enforced by means of a (manually tuned) decreasing learning rate schedule. We propose a practical method for dynamic batch size adaptation. It estimates the variance of the stochastic gradients and adapts the batch size to decrease the variance proportionally to the value of the objective function, removing the need for the aforementioned learning rate decrease. In contrast to recent related work, our algorithm couples the batch size to the learning rate, directly reflecting the known relationship between the two. On three image classification benchmarks, our batch size adaptation yields faster optimization convergence, while simultaneously simplifying learning rate tuning. A TensorFlow implementation is available.

Code link (url) Project Page [BibTex]

Code link (url) Project Page [BibTex]


Thumb xl bodytalk
Crowdshaping Realistic 3D Avatars with Words

Streuber, S., Ramirez, M. Q., Black, M., Zuffi, S., O’Toole, A., Hill, M. Q., Hahn, C. A.

August 2017, Application PCT/EP2017/051954 (misc)

Abstract
A method for generating a body shape, comprising the steps: - receiving one or more linguistic descriptors related to the body shape; - retrieving an association between the one or more linguistic descriptors and a body shape; and - generating the body shape, based on the association.

Google Patents [BibTex]

Google Patents [BibTex]


Thumb xl 1611.04399 image
Joint Graph Decomposition and Node Labeling by Local Search

Levinkov, E., Uhrig, J., Tang, S., Omran, M., Insafutdinov, E., Kirillov, A., Rother, C., Brox, T., Schiele, B., Andres, B.

In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 1904-1912, IEEE, July 2017 (inproceedings)

PDF Supplementary DOI Project Page [BibTex]

PDF Supplementary DOI Project Page [BibTex]


Thumb xl teaser
Dynamic FAUST: Registering Human Bodies in Motion

Bogo, F., Romero, J., Pons-Moll, G., Black, M. J.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, July 2017 (inproceedings)

Abstract
While the ready availability of 3D scan data has influenced research throughout computer vision, less attention has focused on 4D data; that is 3D scans of moving nonrigid objects, captured over time. To be useful for vision research, such 4D scans need to be registered, or aligned, to a common topology. Consequently, extending mesh registration methods to 4D is important. Unfortunately, no ground-truth datasets are available for quantitative evaluation and comparison of 4D registration methods. To address this we create a novel dataset of high-resolution 4D scans of human subjects in motion, captured at 60 fps. We propose a new mesh registration method that uses both 3D geometry and texture information to register all scans in a sequence to a common reference topology. The approach exploits consistency in texture over both short and long time intervals and deals with temporal offsets between shape and texture capture. We show how using geometry alone results in significant errors in alignment when the motions are fast and non-rigid. We evaluate the accuracy of our registration and provide a dataset of 40,000 raw and aligned meshes. Dynamic FAUST extends the popular FAUST dataset to dynamic 4D data, and is available for research purposes at http://dfaust.is.tue.mpg.de.

pdf video Project Page Project Page Project Page [BibTex]

pdf video Project Page Project Page Project Page [BibTex]


Thumb xl surrealin
Learning from Synthetic Humans

Varol, G., Romero, J., Martin, X., Mahmood, N., Black, M. J., Laptev, I., Schmid, C.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, July 2017 (inproceedings)

Abstract
Estimating human pose, shape, and motion from images and videos are fundamental challenges with many applications. Recent advances in 2D human pose estimation use large amounts of manually-labeled training data for learning convolutional neural networks (CNNs). Such data is time consuming to acquire and difficult to extend. Moreover, manual labeling of 3D pose, depth and motion is impractical. In this work we present SURREAL (Synthetic hUmans foR REAL tasks): a new large-scale dataset with synthetically-generated but realistic images of people rendered from 3D sequences of human motion capture data. We generate more than 6 million frames together with ground truth pose, depth maps, and segmentation masks. We show that CNNs trained on our synthetic dataset allow for accurate human depth estimation and human part segmentation in real RGB images. Our results and the new dataset open up new possibilities for advancing person analysis using cheap and large-scale synthetic data.

arXiv project data Project Page Project Page [BibTex]

arXiv project data Project Page Project Page [BibTex]


Thumb xl martinez
On human motion prediction using recurrent neural networks

Martinez, J., Black, M. J., Romero, J.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, July 2017 (inproceedings)

Abstract
Human motion modelling is a classical problem at the intersection of graphics and computer vision, with applications spanning human-computer interaction, motion synthesis, and motion prediction for virtual and augmented reality. Following the success of deep learning methods in several computer vision tasks, recent work has focused on using deep recurrent neural networks (RNNs) to model human motion, with the goal of learning time-dependent representations that perform tasks such as short-term motion prediction and long-term human motion synthesis. We examine recent work, with a focus on the evaluation methodologies commonly used in the literature, and show that, surprisingly, state-of-the-art performance can be achieved by a simple baseline that does not attempt to model motion at all. We investigate this result, and analyze recent RNN methods by looking at the architectures, loss functions, and training procedures used in state-of-the-art approaches. We propose three changes to the standard RNN models typically used for human motion, which result in a simple and scalable RNN architecture that obtains state-of-the-art performance on human motion prediction.

arXiv Project Page [BibTex]

arXiv Project Page [BibTex]


Thumb xl untitled
Articulated Multi-person Tracking in the Wild

Insafutdinov, E., Andriluka, M., Pishchulin, L., Tang, S., Levinkov, E., Andres, B., Schiele, B.

In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 1293-1301, IEEE, July 2017, Oral (inproceedings)

DOI [BibTex]

DOI [BibTex]


Thumb xl joel slow flow crop
Slow Flow: Exploiting High-Speed Cameras for Accurate and Diverse Optical Flow Reference Data

Janai, J., Güney, F., Wulff, J., Black, M., Geiger, A.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, pages: 1406-1416, IEEE, Piscataway, NJ, USA, July 2017 (inproceedings)

Abstract
Existing optical flow datasets are limited in size and variability due to the difficulty of capturing dense ground truth. In this paper, we tackle this problem by tracking pixels through densely sampled space-time volumes recorded with a high-speed video camera. Our model exploits the linearity of small motions and reasons about occlusions from multiple frames. Using our technique, we are able to establish accurate reference flow fields outside the laboratory in natural environments. Besides, we show how our predictions can be used to augment the input images with realistic motion blur. We demonstrate the quality of the produced flow fields on synthetic and real-world datasets. Finally, we collect a novel challenging optical flow dataset by applying our technique on data from a high-speed camera and analyze the performance of the state-of-the-art in optical flow under various levels of motion blur.

pdf suppmat Project page Video DOI Project Page [BibTex]

pdf suppmat Project page Video DOI Project Page [BibTex]


Thumb xl mrflow
Optical Flow in Mostly Rigid Scenes

Wulff, J., Sevilla-Lara, L., Black, M. J.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, pages: 6911-6920, IEEE, Piscataway, NJ, USA, July 2017 (inproceedings)

Abstract
The optical flow of natural scenes is a combination of the motion of the observer and the independent motion of objects. Existing algorithms typically focus on either recovering motion and structure under the assumption of a purely static world or optical flow for general unconstrained scenes. We combine these approaches in an optical flow algorithm that estimates an explicit segmentation of moving objects from appearance and physical constraints. In static regions we take advantage of strong constraints to jointly estimate the camera motion and the 3D structure of the scene over multiple frames. This allows us to also regularize the structure instead of the motion. Our formulation uses a Plane+Parallax framework, which works even under small baselines, and reduces the motion estimation to a one-dimensional search problem, resulting in more accurate estimation. In moving regions the flow is treated as unconstrained, and computed with an existing optical flow method. The resulting Mostly-Rigid Flow (MR-Flow) method achieves state-of-the-art results on both the MPISintel and KITTI-2015 benchmarks.

pdf SupMat video code Project Page [BibTex]

pdf SupMat video code Project Page [BibTex]


Thumb xl img03
OctNet: Learning Deep 3D Representations at High Resolutions

Riegler, G., Ulusoy, O., Geiger, A.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, July 2017 (inproceedings)

Abstract
We present OctNet, a representation for deep learning with sparse 3D data. In contrast to existing models, our representation enables 3D convolutional networks which are both deep and high resolution. Towards this goal, we exploit the sparsity in the input data to hierarchically partition the space using a set of unbalanced octrees where each leaf node stores a pooled feature representation. This allows to focus memory allocation and computation to the relevant dense regions and enables deeper networks without compromising resolution. We demonstrate the utility of our OctNet representation by analyzing the impact of resolution on several 3D tasks including 3D object classification, orientation estimation and point cloud labeling.

pdf suppmat Project Page Video Project Page [BibTex]

pdf suppmat Project Page Video Project Page [BibTex]


Thumb xl 71341 r guided
Reflectance Adaptive Filtering Improves Intrinsic Image Estimation

Nestmeyer, T., Gehler, P. V.

In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 1771-1780, IEEE, Piscataway, NJ, USA, July 2017 (inproceedings)

pre-print DOI Project Page Project Page [BibTex]

pre-print DOI Project Page Project Page [BibTex]


Thumb xl web teaser
Detailed, accurate, human shape estimation from clothed 3D scan sequences

Zhang, C., Pujades, S., Black, M., Pons-Moll, G.

In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, Washington, DC, USA, July 2017, Spotlight (inproceedings)

Abstract
We address the problem of estimating human body shape from 3D scans over time. Reliable estimation of 3D body shape is necessary for many applications including virtual try-on, health monitoring, and avatar creation for virtual reality. Scanning bodies in minimal clothing, however, presents a practical barrier to these applications. We address this problem by estimating body shape under clothing from a sequence of 3D scans. Previous methods that have exploited statistical models of body shape produce overly smooth shapes lacking personalized details. In this paper we contribute a new approach to recover not only an approximate shape of the person, but also their detailed shape. Our approach allows the estimated shape to deviate from a parametric model to fit the 3D scans. We demonstrate the method using high quality 4D data as well as sequences of visual hulls extracted from multi-view images. We also make available a new high quality 4D dataset that enables quantitative evaluation. Our method outperforms the previous state of the art, both qualitatively and quantitatively.

arxiv_preprint video dataset pdf supplemental DOI Project Page [BibTex]

arxiv_preprint video dataset pdf supplemental DOI Project Page [BibTex]


Thumb xl slide1
3D Menagerie: Modeling the 3D Shape and Pose of Animals

Zuffi, S., Kanazawa, A., Jacobs, D., Black, M. J.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, pages: 5524-5532, IEEE, Piscataway, NJ, USA, July 2017 (inproceedings)

Abstract
There has been significant work on learning realistic, articulated, 3D models of the human body. In contrast, there are few such models of animals, despite many applications. The main challenge is that animals are much less cooperative than humans. The best human body models are learned from thousands of 3D scans of people in specific poses, which is infeasible with live animals. Consequently, we learn our model from a small set of 3D scans of toy figurines in arbitrary poses. We employ a novel part-based shape model to compute an initial registration to the scans. We then normalize their pose, learn a statistical shape model, and refine the registrations and the model together. In this way, we accurately align animal scans from different quadruped families with very different shapes and poses. With the registration to a common template we learn a shape space representing animals including lions, cats, dogs, horses, cows and hippos. Animal shapes can be sampled from the model, posed, animated, and fit to data. We demonstrate generalization by fitting it to images of real animals including species not seen in training.

pdf video Project Page [BibTex]

pdf video Project Page [BibTex]


Thumb xl pyramid
Optical Flow Estimation using a Spatial Pyramid Network

Ranjan, A., Black, M.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, July 2017 (inproceedings)

Abstract
We learn to compute optical flow by combining a classical spatial-pyramid formulation with deep learning. This estimates large motions in a coarse-to-fine approach by warping one image of a pair at each pyramid level by the current flow estimate and computing an update to the flow. Instead of the standard minimization of an objective function at each pyramid level, we train one deep network per level to compute the flow update. Unlike the recent FlowNet approach, the networks do not need to deal with large motions; these are dealt with by the pyramid. This has several advantages. First, our Spatial Pyramid Network (SPyNet) is much simpler and 96% smaller than FlowNet in terms of model parameters. This makes it more efficient and appropriate for embedded applications. Second, since the flow at each pyramid level is small (< 1 pixel), a convolutional approach applied to pairs of warped images is appropriate. Third, unlike FlowNet, the learned convolution filters appear similar to classical spatio-temporal filters, giving insight into the method and how to improve it. Our results are more accurate than FlowNet on most standard benchmarks, suggesting a new direction of combining classical flow methods with deep learning.

pdf SupMat project/code [BibTex]

pdf SupMat project/code [BibTex]


Thumb xl imgidx 00197
Multiple People Tracking by Lifted Multicut and Person Re-identification

Tang, S., Andriluka, M., Andres, B., Schiele, B.

In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 3701-3710, IEEE Computer Society, Washington, DC, USA, July 2017 (inproceedings)

DOI Project Page [BibTex]

DOI Project Page [BibTex]


Thumb xl vpn teaser
Video Propagation Networks

Jampani, V., Gadde, R., Gehler, P. V.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, July 2017 (inproceedings)

pdf supplementary arXiv project page code Project Page [BibTex]

pdf supplementary arXiv project page code Project Page [BibTex]


Thumb xl anja
Generating Descriptions with Grounded and Co-Referenced People

Rohrbach, A., Rohrbach, M., Tang, S., Oh, S. J., Schiele, B.

In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 4196-4206, IEEE, Piscataway, NJ, USA, July 2017 (inproceedings)

PDF DOI Project Page [BibTex]

PDF DOI Project Page [BibTex]


Thumb xl cvpr2017 landpsace
Semantic Multi-view Stereo: Jointly Estimating Objects and Voxels

Ulusoy, A. O., Black, M. J., Geiger, A.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, July 2017 (inproceedings)

Abstract
Dense 3D reconstruction from RGB images is a highly ill-posed problem due to occlusions, textureless or reflective surfaces, as well as other challenges. We propose object-level shape priors to address these ambiguities. Towards this goal, we formulate a probabilistic model that integrates multi-view image evidence with 3D shape information from multiple objects. Inference in this model yields a dense 3D reconstruction of the scene as well as the existence and precise 3D pose of the objects in it. Our approach is able to recover fine details not captured in the input shapes while defaulting to the input models in occluded regions where image evidence is weak. Due to its probabilistic nature, the approach is able to cope with the approximate geometry of the 3D models as well as input shapes that are not present in the scene. We evaluate the approach quantitatively on several challenging indoor and outdoor datasets.

YouTube pdf suppmat Project Page [BibTex]

YouTube pdf suppmat Project Page [BibTex]


Thumb xl judith
Deep representation learning for human motion prediction and classification

Bütepage, J., Black, M., Kragic, D., Kjellström, H.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, July 2017 (inproceedings)

Abstract
Generative models of 3D human motion are often restricted to a small number of activities and can therefore not generalize well to novel movements or applications. In this work we propose a deep learning framework for human motion capture data that learns a generic representation from a large corpus of motion capture data and generalizes well to new, unseen, motions. Using an encoding-decoding network that learns to predict future 3D poses from the most recent past, we extract a feature representation of human motion. Most work on deep learning for sequence prediction focuses on video and speech. Since skeletal data has a different structure, we present and evaluate different network architectures that make different assumptions about time dependencies and limb correlations. To quantify the learned features, we use the output of different layers for action classification and visualize the receptive fields of the network units. Our method outperforms the recent state of the art in skeletal motion prediction even though these use action specific training data. Our results show that deep feedforward networks, trained from a generic mocap database, can successfully be used for feature extraction from human motion data and that this representation can be used as a foundation for classification and prediction.

arXiv Project Page [BibTex]

arXiv Project Page [BibTex]


Thumb xl teasercrop
Unite the People: Closing the Loop Between 3D and 2D Human Representations

Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M. J., Gehler, P. V.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, July 2017 (inproceedings)

Abstract
3D models provide a common ground for different representations of human bodies. In turn, robust 2D estimation has proven to be a powerful tool to obtain 3D fits “in-the-wild”. However, depending on the level of detail, it can be hard to impossible to acquire labeled data for training 2D estimators on large scale. We propose a hybrid approach to this problem: with an extended version of the recently introduced SMPLify method, we obtain high quality 3D body model fits for multiple human pose datasets. Human annotators solely sort good and bad fits. This procedure leads to an initial dataset, UP-3D, with rich annotations. With a comprehensive set of experiments, we show how this data can be used to train discriminative models that produce results with an unprecedented level of detail: our models predict 31 segments and 91 landmark locations on the body. Using the 91 landmark pose estimator, we present state-of-the art results for 3D human pose and shape estimation using an order of magnitude less training data and without assumptions about gender or pose in the fitting procedure. We show that UP-3D can be enhanced with these improved fits to grow in quantity and quality, which makes the system deployable on large scale. The data, code and models are available for research purposes.

arXiv project/code/data Project Page [BibTex]

arXiv project/code/data Project Page [BibTex]


Thumb xl dapepatent
System and method for simulating realistic clothing

Black, M. J., Guan, P.

June 2017, U.S.~Patent 9,679,409 B2 (misc)

Abstract
Systems, methods, and computer-readable storage media for simulating realistic clothing. The system generates a clothing deformation model for a clothing type, wherein the clothing deformation model factors a change of clothing shape due to rigid limb rotation, pose-independent body shape, and pose-dependent deformations. Next, the system generates a custom-shaped garment for a given body by mapping, via the clothing deformation model, body shape parameters to clothing shape parameters. The system then automatically dresses the given body with the custom- shaped garment.

Google Patents pdf [BibTex]


Thumb xl muvs
Towards Accurate Marker-less Human Shape and Pose Estimation over Time

Huang, Y., Bogo, F., Lassner, C., Kanazawa, A., Gehler, P. V., Romero, J., Akhter, I., Black, M. J.

In International Conference on 3D Vision (3DV), pages: 421-430, 2017 (inproceedings)

Abstract
Existing markerless motion capture methods often assume known backgrounds, static cameras, and sequence specific motion priors, limiting their application scenarios. Here we present a fully automatic method that, given multiview videos, estimates 3D human pose and body shape. We take the recently proposed SMPLify method [12] as the base method and extend it in several ways. First we fit a 3D human body model to 2D features detected in multi-view images. Second, we use a CNN method to segment the person in each image and fit the 3D body model to the contours, further improving accuracy. Third we utilize a generic and robust DCT temporal prior to handle the left and right side swapping issue sometimes introduced by the 2D pose estimator. Validation on standard benchmarks shows our results are comparable to the state of the art and also provide a realistic 3D shape avatar. We also demonstrate accurate results on HumanEva and on challenging monocular sequences of dancing from YouTube.

Code pdf DOI Project Page [BibTex]

2013


Thumb xl iccv2013 siyu
Learning People Detectors for Tracking in Crowded Scenes

Tang, S., Andriluka, M., Milan, A., Schindler, K., Roth, S., Schiele, B.

In 2013 IEEE International Conference on Computer Vision, pages: 1049-1056, IEEE, December 2013 (inproceedings)

PDF DOI [BibTex]

2013

PDF DOI [BibTex]


Thumb xl thumb
Strong Appearance and Expressive Spatial Models for Human Pose Estimation

Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.

In International Conference on Computer Vision (ICCV), pages: 3487 - 3494 , IEEE, December 2013 (inproceedings)

Abstract
Typical approaches to articulated pose estimation combine spatial modelling of the human body with appearance modelling of body parts. This paper aims to push the state-of-the-art in articulated pose estimation in two ways. First we explore various types of appearance representations aiming to substantially improve the body part hypotheses. And second, we draw on and combine several recently proposed powerful ideas such as more flexible spatial models as well as image-conditioned spatial models. In a series of experiments we draw several important conclusions: (1) we show that the proposed appearance representations are complementary; (2) we demonstrate that even a basic tree-structure spatial human body model achieves state-of-the-art performance when augmented with the proper appearance representation; and (3) we show that the combination of the best performing appearance model with a flexible image-conditioned spatial model achieves the best result, significantly improving over the state of the art, on the "Leeds Sports Poses'' and "Parse'' benchmarks.

pdf DOI Project Page [BibTex]

pdf DOI Project Page [BibTex]


Thumb xl screenshot area 2015 07 27 004304
Methods and Applications for Distance Based ANN Training

Lassner, C., Lienhart, R.

In IEEE International Conference on Machine Learning and Applications (ICMLA), December 2013 (inproceedings)

Abstract
Feature learning has the aim to take away the hassle of hand-designing features for machine learning tasks. Since the feature design process is tedious and requires a lot of experience, an automated solution is of great interest. However, an important problem in this field is that usually no objective values are available to fit a feature learning function to. Artificial Neural Networks are a sufficiently flexible tool for function approximation to be able to avoid this problem. We show how the error function of an ANN can be modified such that it works solely with objective distances instead of objective values. We derive the adjusted rules for backpropagation through networks with arbitrary depths and include practical considera- tions that must be taken into account to apply difference based learning successfully. On all three benchmark datasets we use, linear SVMs trained on automatically learned ANN features outperform RBF kernel SVMs trained on the raw data. This can be achieved in a feature space with up to only a tenth of dimensions of the number of original data dimensions. We conclude our work with two experiments on distance based ANN training in two further fields: data visualization and outlier detection.

pdf [BibTex]

pdf [BibTex]


Thumb xl zhang
Understanding High-Level Semantics by Modeling Traffic Patterns

Zhang, H., Geiger, A., Urtasun, R.

In International Conference on Computer Vision, pages: 3056-3063, Sydney, Australia, December 2013 (inproceedings)

Abstract
In this paper, we are interested in understanding the semantics of outdoor scenes in the context of autonomous driving. Towards this goal, we propose a generative model of 3D urban scenes which is able to reason not only about the geometry and objects present in the scene, but also about the high-level semantics in the form of traffic patterns. We found that a small number of patterns is sufficient to model the vast majority of traffic scenes and show how these patterns can be learned. As evidenced by our experiments, this high-level reasoning significantly improves the overall scene estimation as well as the vehicle-to-lane association when compared to state-of-the-art approaches. All data and code will be made available upon publication.

pdf [BibTex]

pdf [BibTex]


Thumb xl thumb
A Non-parametric Bayesian Network Prior of Human Pose

Lehrmann, A. M., Gehler, P., Nowozin, S.

In Proceedings IEEE Conf. on Computer Vision (ICCV), pages: 1281-1288, December 2013 (inproceedings)

Abstract
Having a sensible prior of human pose is a vital ingredient for many computer vision applications, including tracking and pose estimation. While the application of global non-parametric approaches and parametric models has led to some success, finding the right balance in terms of flexibility and tractability, as well as estimating model parameters from data has turned out to be challenging. In this work, we introduce a sparse Bayesian network model of human pose that is non-parametric with respect to the estimation of both its graph structure and its local distributions. We describe an efficient sampling scheme for our model and show its tractability for the computation of exact log-likelihoods. We empirically validate our approach on the Human 3.6M dataset and demonstrate superior performance to global models and parametric networks. We further illustrate our model's ability to represent and compose poses not present in the training set (compositionality) and describe a speed-accuracy trade-off that allows realtime scoring of poses.

Project page pdf DOI Project Page [BibTex]

Project page pdf DOI Project Page [BibTex]


Thumb xl jhuang
Towards understanding action recognition

Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M. J.

In IEEE International Conference on Computer Vision (ICCV), pages: 3192-3199, IEEE, Sydney, Australia, December 2013 (inproceedings)

Abstract
Although action recognition in videos is widely studied, current methods often fail on real-world datasets. Many recent approaches improve accuracy and robustness to cope with challenging video sequences, but it is often unclear what affects the results most. This paper attempts to provide insights based on a systematic performance evaluation using thoroughly-annotated data of human actions. We annotate human Joints for the HMDB dataset (J-HMDB). This annotation can be used to derive ground truth optical flow and segmentation. We evaluate current methods using this dataset and systematically replace the output of various algorithms with ground truth. This enables us to discover what is important – for example, should we work on improving flow algorithms, estimating human bounding boxes, or enabling pose estimation? In summary, we find that highlevel pose features greatly outperform low/mid level features; in particular, pose over time is critical, but current pose estimation algorithms are not yet reliable enough to provide this information. We also find that the accuracy of a top-performing action recognition framework can be greatly increased by refining the underlying low/mid level features; this suggests it is important to improve optical flow and human detection algorithms. Our analysis and JHMDB dataset should facilitate a deeper understanding of action recognition algorithms.

Website Errata Poster Paper Slides DOI Project Page Project Page Project Page [BibTex]

Website Errata Poster Paper Slides DOI Project Page Project Page Project Page [BibTex]


Thumb xl embs2013
Mixing Decoded Cursor Velocity and Position from an Offline Kalman Filter Improves Cursor Control in People with Tetraplegia

Homer, M., Harrison, M., Black, M. J., Perge, J., Cash, S., Friehs, G., Hochberg, L.

In 6th International IEEE EMBS Conference on Neural Engineering, pages: 715-718, San Diego, November 2013 (inproceedings)

Abstract
Kalman filtering is a common method to decode neural signals from the motor cortex. In clinical research investigating the use of intracortical brain computer interfaces (iBCIs), the technique enabled people with tetraplegia to control assistive devices such as a computer or robotic arm directly from their neural activity. For reaching movements, the Kalman filter typically estimates the instantaneous endpoint velocity of the control device. Here, we analyzed attempted arm/hand movements by people with tetraplegia to control a cursor on a computer screen to reach several circular targets. A standard velocity Kalman filter is enhanced to additionally decode for the cursor’s position. We then mix decoded velocity and position to generate cursor movement commands. We analyzed data, offline, from two participants across six sessions. Root mean squared error between the actual and estimated cursor trajectory improved by 12.2 ±10.5% (pairwise t-test, p<0.05) as compared to a standard velocity Kalman filter. The findings suggest that simultaneously decoding for intended velocity and position and using them both to generate movement commands can improve the performance of iBCIs.

pdf Project Page [BibTex]

pdf Project Page [BibTex]


no image
Multi-Robot Cooperative Object Tracking Based on Particle Filters

Ahmad, A., Lima, P.

In Robotics and Autonomous Systems, 61(10):1084-1093, October 2013 (inproceedings)

Abstract
This article presents a cooperative approach for tracking a moving object by a team of mobile robots equipped with sensors, in a highly dynamic environment. The tracker’s core is a particle filter, modified to handle, within a single unified framework, the problem of complete or partial occlusion for some of the involved mobile sensors, as well as inconsistent estimates in the global frame among sensors, due to observation errors and/or self-localization uncertainty. We present results supporting our approach by applying it to a team of real soccer robots tracking a soccer ball.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Thumb xl implied flow whue
Puppet Flow

Zuffi, S., Black, M. J.

(7), Max Planck Institute for Intelligent Systems, October 2013 (techreport)

Abstract
We introduce Puppet Flow (PF), a layered model describing the optical flow of a person in a video sequence. We consider video frames composed by two layers: a foreground layer corresponding to a person, and background. We model the background as an affine flow field. The foreground layer, being a moving person, requires reasoning about the articulated nature of the human body. We thus represent the foreground layer with the Deformable Structures model (DS), a parametrized 2D part-based human body representation. We call the motion field defined through articulated motion and deformation of the DS model, a Puppet Flow. By exploiting the DS representation, Puppet Flow is a parametrized optical flow field, where parameters are the person's pose, gender and body shape.

pdf Project Page Project Page [BibTex]

pdf Project Page Project Page [BibTex]


no image
D2.1.4 RoCKIn@Work - Innovation in Mobile Industrial Manipulation Competition Design, Rule Book, and Scenario Construction

Ahmad, A., Awaad, I., Amigoni, F., Berghofer, J., Bischoff, R., Bonarini, A., Dwiputra, R., Hegger, F., Hochgeschwender, N., Iocchi, L., Kraetzschmar, G., Lima, P., Matteucci, M., Nardi, D., Schneider, S.

(FP7-ICT-601012 Revision 0.7), RoCKIn - Robot Competitions Kick Innovation in Cognitive Systems and Robotics, sep 2013 (techreport)

Abstract
RoCKIn is a EU-funded project aiming to foster scientific progress and innovation in cognitive systems and robotics through the design and implementation of competitions. An additional objective of RoCKIn is to increase public awareness of the current state-of-the-art in robotics in Europe and to demonstrate the innovation potential of robotics applications for solving societal challenges and improving the competitiveness of Europe in the global markets. In order to achieve these objectives, RoCKIn develops two competitions, one for domestic service robots (RoCKIn@Home) and one for industrial robots in factories (RoCKIn-@Work). These competitions are designed around challenges that are based on easy-to-communicate and convincing user stories, which catch the interest of both the general public and the scientifc community. The latter is in particular interested in solving open scientific challenges and to thoroughly assess, compare, and evaluate the developed approaches with competing ones. To allow this to happen, the competitions are designed to meet the requirements of benchmarking procedures and good experimental methods. The integration of benchmarking technology with the competition concept is one of the main objectives of RoCKIn. This document describes the first version of the RoCKIn@Work competition, which will be held for the first time in 2014. The first chapter of the document gives a brief overview, outlining the purpose and objective of the competition, the methodological approach taken by the RoCKIn project, the user story upon which the competition is based, the structure and organization of the competition, and the commonalities and differences with the RoboCup@Work competition, which served as inspiration for RoCKIn@Work. The second chapter provides details on the user story and analyzes the scientific and technical challenges it poses. Consecutive chapters detail the competition scenario, the competition design, and the organization of the competition. The appendices contain information on a library of functionalities, which we believe are needed, or at least useful, for building competition entries, details on the scenario construction, and a detailed account of the benchmarking infrastructure needed — and provided by RoCKIn.

[BibTex]

[BibTex]


no image
D2.1.1 RoCKIn@Home - A Competition for Domestic Service Robots Competition Design, Rule Book, and Scenario Construction

Ahmad, A., Awaad, I., Amigoni, F., Berghofer, J., Bischoff, R., Bonarini, A., Dwiputra, R., Hegger, F., Hochgeschwender, N., Iocchi, L., Kraetzschmar, G., Lima, P., Matteucci, M., Nardi, D., Schneider, S.

(FP7-ICT-601012 Revision 0.7), RoCKIn - Robot Competitions Kick Innovation in Cognitive Systems and Robotics, sep 2013 (techreport)

Abstract
RoCKIn is a EU-funded project aiming to foster scientific progress and innovation in cognitive systems and robotics through the design and implementation of competitions. An additional objective of RoCKIn is to increase public awareness of the current state-of-the-art in robotics in Europe and to demonstrate the innovation potential of robotics applications for solving societal challenges and improving the competitiveness of Europe in the global markets. In order to achieve these objectives, RoCKIn develops two competitions, one for domestic service robots (RoCKIn@Home) and one for industrial robots in factories (RoCKIn-@Work). These competitions are designed around challenges that are based on easy-to-communicate and convincing user stories, which catch the interest of both the general public and the scientifc community. The latter is in particular interested in solving open scientific challenges and to thoroughly assess, compare, and evaluate the developed approaches with competing ones. To allow this to happen, the competitions are designed to meet the requirements of benchmarking procedures and good experimental methods. The integration of benchmarking technology with the competition concept is one of the main objectives of RoCKIn. This document describes the first version of the RoCKIn@Home competition, which will be held for the first time in 2014. The first chapter of the document gives a brief overview, outlining the purpose and objective of the competition, the methodological approach taken by the RoCKIn project, the user story upon which the competition is based, the structure and organization of the competition, and the commonalities and differences with the RoboCup@Home competition, which served as inspiration for RoCKIn@Home. The second chapter provides details on the user story and analyzes the scientific and technical challenges it poses. Consecutive chapters detail the competition scenario, the competition design, and the organization of the competition. The appendices contain information on a library of functionalities, which we believe are needed, or at least useful, for building competition entries, details on the scenario construction, and a detailed account of the benchmarking infrastructure needed — and provided by RoCKIn.

[BibTex]

[BibTex]


Thumb xl bmvc teaser
Distribution Fields with Adaptive Kernels for Large Displacement Image Alignment

Mears, B., Sevilla-Lara, L., Learned-Miller, E.

In British Machine Vision Conference (BMVC) , BMVA Press, September 2013 (inproceedings)

Abstract
While region-based image alignment algorithms that use gradient descent can achieve sub-pixel accuracy when they converge, their convergence depends on the smoothness of the image intensity values. Image smoothness is often enforced through the use of multiscale approaches in which images are smoothed and downsampled. Yet, these approaches typically use fixed smoothing parameters which may be appropriate for some images but not for others. Even for a particular image, the optimal smoothing parameters may depend on the magnitude of the transformation. When the transformation is large, the image should be smoothed more than when the transformation is small. Further, with gradient-based approaches, the optimal smoothing parameters may change with each iteration as the algorithm proceeds towards convergence. We address convergence issues related to the choice of smoothing parameters by deriving a Gauss-Newton gradient descent algorithm based on distribution fields (DFs) and proposing a method to dynamically select smoothing parameters at each iteration. DF and DF-like representations have previously been used in the context of tracking. In this work we incorporate DFs into a full affine model for region-based alignment and simultaneously search over parameterized sets of geometric and photometric transforms. We use a probabilistic interpretation of DFs to select smoothing parameters at each step in the optimization and show that this results in improved convergence rates.

pdf code [BibTex]

pdf code [BibTex]


Thumb xl teaser mrg
Metric Regression Forests for Human Pose Estimation

(Best Science Paper Award)

Pons-Moll, G., Taylor, J., Shotton, J., Hertzmann, A., Fitzgibbon, A.

In British Machine Vision Conference (BMVC) , BMVA Press, September 2013 (inproceedings)

pdf [BibTex]

pdf [BibTex]


no image
D1.1 Specification of General Features of Scenarios and Robots for Benchmarking Through Competitions

Ahmad, A., Awaad, I., Amigoni, F., Berghofer, J., Bischoff, R., Bonarini, A., Dwiputra, R., Fontana, G., Hegger, F., Hochgeschwender, N., Iocchi, L., Kraetzschmar, G., Lima, P., Matteucci, M., Nardi, D., Schiaffonati, V., Schneider, S.

(FP7-ICT-601012 Revision 1.0), RoCKIn - Robot Competitions Kick Innovation in Cognitive Systems and Robotics, July 2013 (techreport)

Abstract
RoCKIn is a EU-funded project aiming to foster scientific progress and innovation in cognitive systems and robotics through the design and implementation of competitions. An additional objective of RoCKIn is to increase public awareness of the current state-of-the-art in robotics and the innovation potential of robotics applications. From these objectives several requirements for the work performed in RoCKIn can be derived: The RoCKIn competitions must start from convincing, easy-to-communicate user stories, that catch the attention of relevant stakeholders, the media, and the crowd. The user stories play the role of a mid- to long-term vision for a competition. Preferably, the user stories address economic, societal, or environmental problems. The RoCKIn competitions must pose open scientific challenges of interest to sufficiently many researchers to attract existing and new teams of robotics researchers for participation in the competition. The competitions need to promise some suitable reward, such as recognition in the scientific community, publicity for a team’s work, awards, or prize money, to justify the effort a team puts into the development of a competition entry. The competitions should be designed in such a way that they reward general, scientifically sound solutions to the challenge problems; such general solutions should score better than approaches that work only in narrowly defined contexts and are considred over-engineered. The challenges motivating the RoCKIn competitions must be broken down into suitable intermediate goals that can be reached with a limited team effort until the next competition and the project duration. The RoCKIn competitions must be well-defined and well-designed, with comprehensive rule books and instructions for the participants in order to guarantee a fair competition. The RoCKIn competitions must integrate competitions with benchmarking in order to provide comprehensive feedback for the teams about the suitability of particular functional modules, their overall architecture, and system integration. This document takes the first steps towards the RoCKIn goals. After outlining our approach, we present several user stories for further discussion within the community. The main objectives of this document are to identify and document relevant scenario features and the tasks and functionalities subject for benchmarking in the competitions.

[BibTex]

[BibTex]


no image
SocRob-MSL 2013 Team Description Paper for Middle Sized League

Messias, J., Ahmad, A., Reis, J., Serafim, M., Lima, P.

17th Annual RoboCup International Symposium 2013, July 2013 (techreport)

Abstract
This paper describes the status of the SocRob MSL robotic soccer team as required by the RoboCup 2013 qualification procedures. The team’s latest scientific and technical developments, since its last participation in RoboCup MSL, include further advances in cooperative perception; novel communication methods for distributed robotics; progressive deployment of the ROS middleware; improved localization through feature tracking and Mixture MCL; novel planning methods based on Petri nets and decision-theoretic frameworks; and hardware developments in ball-handling/kicking devices.

link (url) [BibTex]

link (url) [BibTex]


Thumb xl thumb
Poselet conditioned pictorial structures

Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.

In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages: 588 - 595, IEEE, Portland, OR, June 2013 (inproceedings)

pdf DOI Project Page [BibTex]

pdf DOI Project Page [BibTex]


Thumb xl thumb
Occlusion Patterns for Object Class Detection

Pepik, B., Stark, M., Gehler, P., Schiele, B.

In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Portland, OR, June 2013 (inproceedings)

Abstract
Despite the success of recent object class recognition systems, the long-standing problem of partial occlusion re- mains a major challenge, and a principled solution is yet to be found. In this paper we leave the beaten path of meth- ods that treat occlusion as just another source of noise – instead, we include the occluder itself into the modelling, by mining distinctive, reoccurring occlusion patterns from annotated training data. These patterns are then used as training data for dedicated detectors of varying sophistica- tion. In particular, we evaluate and compare models that range from standard object class detectors to hierarchical, part-based representations of occluder/occludee pairs. In an extensive evaluation we derive insights that can aid fur- ther developments in tackling the occlusion challenge.

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Thumb xl lost
Lost! Leveraging the Crowd for Probabilistic Visual Self-Localization

(CVPR13 Best Paper Runner-Up)

Brubaker, M. A., Geiger, A., Urtasun, R.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2013), pages: 3057-3064, IEEE, Portland, OR, June 2013 (inproceedings)

Abstract
In this paper we propose an affordable solution to self- localization, which utilizes visual odometry and road maps as the only inputs. To this end, we present a probabilis- tic model as well as an efficient approximate inference al- gorithm, which is able to utilize distributed computation to meet the real-time requirements of autonomous systems. Because of the probabilistic nature of the model we are able to cope with uncertainty due to noisy visual odometry and inherent ambiguities in the map ( e.g ., in a Manhattan world). By exploiting freely available, community devel- oped maps and visual odometry measurements, we are able to localize a vehicle up to 3m after only a few seconds of driving on maps which contain more than 2,150km of driv- able roads.

pdf supplementary project page [BibTex]

pdf supplementary project page [BibTex]


Thumb xl poseregression
Human Pose Estimation using Body Parts Dependent Joint Regressors

Dantone, M., Gall, J., Leistner, C., van Gool, L.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 3041-3048, IEEE, Portland, OR, USA, June 2013 (inproceedings)

Abstract
In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the problem of obtaining good part templates by proposing novel, non-linear joint regressors. In particular, we employ two-layered random forests as joint regressors. The first layer acts as a discriminative, independent body part classifier. The second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This results in a pose estimation framework that takes dependencies between body parts already for joint localization into account and is thus able to circumvent typical ambiguities of tree structures, such as for legs and arms. In the experiments, we demonstrate that our body parts dependent joint regressors achieve a higher joint localization accuracy than tree-based state-of-the-art methods.

pdf DOI Project Page [BibTex]

pdf DOI Project Page [BibTex]


Thumb xl deqingcvpr13b
A fully-connected layered model of foreground and background flow

Sun, D., Wulff, J., Sudderth, E., Pfister, H., Black, M.

In IEEE Conf. on Computer Vision and Pattern Recognition, (CVPR 2013), pages: 2451-2458, Portland, OR, June 2013 (inproceedings)

Abstract
Layered models allow scene segmentation and motion estimation to be formulated together and to inform one another. Traditional layered motion methods, however, employ fairly weak models of scene structure, relying on locally connected Ising/Potts models which have limited ability to capture long-range correlations in natural scenes. To address this, we formulate a fully-connected layered model that enables global reasoning about the complicated segmentations of real objects. Optimization with fully-connected graphical models is challenging, and our inference algorithm leverages recent work on efficient mean field updates for fully-connected conditional random fields. These methods can be implemented efficiently using high-dimensional Gaussian filtering. We combine these ideas with a layered flow model, and find that the long-range connections greatly improve segmentation into figure-ground layers when compared with locally connected MRF models. Experiments on several benchmark datasets show that the method can recover fine structures and large occlusion regions, with good flow accuracy and much lower computational cost than previous locally-connected layered models.

pdf Supplemental Material Project Page Project Page [BibTex]

pdf Supplemental Material Project Page Project Page [BibTex]


no image
Perception-driven multi-robot formation control

Ahmad, A., Nascimento, T., Conceicao, A., Moreira, A., Lima, P.

In pages: 1851-1856, IEEE, May 2013 (inproceedings)

Abstract
Maximizing the performance of cooperative perception of a tracked target by a team of mobile robots while maintaining the team's formation is the core problem addressed in this work. We propose a solution by integrating the controller and the estimator modules in a formation control loop. The controller module is a distributed non-linear model predictive controller and the estimator module is based on a particle filter for cooperative target tracking. A formal description of the integration followed by simulation and real robot results on two different teams of homogeneous robots are presented. The results highlight how our method successfully enables a team of homogeneous robots to minimize the total uncertainty of the tracked target's cooperative estimate while complying with the performance criteria such as keeping a pre-set distance between the team-mates and/or the target and obstacle avoidance.

DOI [BibTex]

DOI [BibTex]


no image
Cooperative Robot Localization and Target Tracking based on Least Squares Minimization

Ahmad, A., Tipaldi, G., Lima, P., Burgard, W.

In pages: 5696-5701, IEEE, May 2013 (inproceedings)

Abstract
In this paper we address the problem of cooperative localization and target tracking with a team of moving robots. We model the problem as a least squares minimization problem and show that this problem can be efficiently solved using sparse optimization methods. To achieve this, we represent the problem as a graph, where the nodes are robot and target poses at individual time-steps and the edges are their relative measurements. Static landmarks at known position are used to define a common reference frame for the robots and the targets. In this way, we mitigate the risk of using measurements and state estimates more than once, since all the relative measurements are i.i.d. and no marginalization is performed. Experiments performed using a set of real robots show higher accuracy compared to a Kalman filter.

DOI [BibTex]

DOI [BibTex]


no image
Unknown-color spherical object detection and tracking

Troppan, A., Guerreiro, E., Celiberti, F., Santos, G., Ahmad, A., Lima, P.

In pages: 1-4, IEEE, April 2013 (inproceedings)

Abstract
Detection and tracking of an unknown-color spherical object in a partially-known environment using a robot with a single camera is the core problem addressed in this article. A novel color detection mechanism, which exploits the geometrical properties of the spherical object's projection onto the image plane, precedes the object's detection process. A Kalman filter-based tracker uses the object detection in its update step and tracks the spherical object. Real robot experimental evaluation of the proposed method is presented on soccer robots detecting and tracking an unknown-color ball.

DOI [BibTex]

DOI [BibTex]