Header logo is ps


2013


Occlusion Patterns for Object Class Detection
Occlusion Patterns for Object Class Detection

Pepik, B., Stark, M., Gehler, P., Schiele, B.

In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Portland, OR, June 2013 (inproceedings)

Abstract
Despite the success of recent object class recognition systems, the long-standing problem of partial occlusion re- mains a major challenge, and a principled solution is yet to be found. In this paper we leave the beaten path of meth- ods that treat occlusion as just another source of noise – instead, we include the occluder itself into the modelling, by mining distinctive, reoccurring occlusion patterns from annotated training data. These patterns are then used as training data for dedicated detectors of varying sophistica- tion. In particular, we evaluate and compare models that range from standard object class detectors to hierarchical, part-based representations of occluder/occludee pairs. In an extensive evaluation we derive insights that can aid fur- ther developments in tackling the occlusion challenge.

pdf Project Page [BibTex]

2013

pdf Project Page [BibTex]


Lost! Leveraging the Crowd for Probabilistic Visual Self-Localization
Lost! Leveraging the Crowd for Probabilistic Visual Self-Localization

(CVPR13 Best Paper Runner-Up)

Brubaker, M. A., Geiger, A., Urtasun, R.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2013), pages: 3057-3064, IEEE, Portland, OR, June 2013 (inproceedings)

Abstract
In this paper we propose an affordable solution to self- localization, which utilizes visual odometry and road maps as the only inputs. To this end, we present a probabilis- tic model as well as an efficient approximate inference al- gorithm, which is able to utilize distributed computation to meet the real-time requirements of autonomous systems. Because of the probabilistic nature of the model we are able to cope with uncertainty due to noisy visual odometry and inherent ambiguities in the map ( e.g ., in a Manhattan world). By exploiting freely available, community devel- oped maps and visual odometry measurements, we are able to localize a vehicle up to 3m after only a few seconds of driving on maps which contain more than 2,150km of driv- able roads.

pdf supplementary project page [BibTex]

pdf supplementary project page [BibTex]


Human Pose Estimation using Body Parts Dependent Joint Regressors
Human Pose Estimation using Body Parts Dependent Joint Regressors

Dantone, M., Gall, J., Leistner, C., van Gool, L.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 3041-3048, IEEE, Portland, OR, USA, June 2013 (inproceedings)

Abstract
In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the problem of obtaining good part templates by proposing novel, non-linear joint regressors. In particular, we employ two-layered random forests as joint regressors. The first layer acts as a discriminative, independent body part classifier. The second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This results in a pose estimation framework that takes dependencies between body parts already for joint localization into account and is thus able to circumvent typical ambiguities of tree structures, such as for legs and arms. In the experiments, we demonstrate that our body parts dependent joint regressors achieve a higher joint localization accuracy than tree-based state-of-the-art methods.

pdf DOI Project Page [BibTex]

pdf DOI Project Page [BibTex]


A fully-connected layered model of foreground and background flow
A fully-connected layered model of foreground and background flow

Sun, D., Wulff, J., Sudderth, E., Pfister, H., Black, M.

In IEEE Conf. on Computer Vision and Pattern Recognition, (CVPR 2013), pages: 2451-2458, Portland, OR, June 2013 (inproceedings)

Abstract
Layered models allow scene segmentation and motion estimation to be formulated together and to inform one another. Traditional layered motion methods, however, employ fairly weak models of scene structure, relying on locally connected Ising/Potts models which have limited ability to capture long-range correlations in natural scenes. To address this, we formulate a fully-connected layered model that enables global reasoning about the complicated segmentations of real objects. Optimization with fully-connected graphical models is challenging, and our inference algorithm leverages recent work on efficient mean field updates for fully-connected conditional random fields. These methods can be implemented efficiently using high-dimensional Gaussian filtering. We combine these ideas with a layered flow model, and find that the long-range connections greatly improve segmentation into figure-ground layers when compared with locally connected MRF models. Experiments on several benchmark datasets show that the method can recover fine structures and large occlusion regions, with good flow accuracy and much lower computational cost than previous locally-connected layered models.

pdf Supplemental Material Project Page Project Page [BibTex]

pdf Supplemental Material Project Page Project Page [BibTex]


no image
Perception-driven multi-robot formation control

Ahmad, A., Nascimento, T., Conceicao, A., Moreira, A., Lima, P.

In pages: 1851-1856, IEEE, May 2013 (inproceedings)

Abstract
Maximizing the performance of cooperative perception of a tracked target by a team of mobile robots while maintaining the team's formation is the core problem addressed in this work. We propose a solution by integrating the controller and the estimator modules in a formation control loop. The controller module is a distributed non-linear model predictive controller and the estimator module is based on a particle filter for cooperative target tracking. A formal description of the integration followed by simulation and real robot results on two different teams of homogeneous robots are presented. The results highlight how our method successfully enables a team of homogeneous robots to minimize the total uncertainty of the tracked target's cooperative estimate while complying with the performance criteria such as keeping a pre-set distance between the team-mates and/or the target and obstacle avoidance.

DOI [BibTex]

DOI [BibTex]


no image
Cooperative Robot Localization and Target Tracking based on Least Squares Minimization

Ahmad, A., Tipaldi, G., Lima, P., Burgard, W.

In pages: 5696-5701, IEEE, May 2013 (inproceedings)

Abstract
In this paper we address the problem of cooperative localization and target tracking with a team of moving robots. We model the problem as a least squares minimization problem and show that this problem can be efficiently solved using sparse optimization methods. To achieve this, we represent the problem as a graph, where the nodes are robot and target poses at individual time-steps and the edges are their relative measurements. Static landmarks at known position are used to define a common reference frame for the robots and the targets. In this way, we mitigate the risk of using measurements and state estimates more than once, since all the relative measurements are i.i.d. and no marginalization is performed. Experiments performed using a set of real robots show higher accuracy compared to a Kalman filter.

DOI [BibTex]

DOI [BibTex]


Unscented Kalman Filtering on Riemannian Manifolds
Unscented Kalman Filtering on Riemannian Manifolds

Soren Hauberg, Francois Lauze, Kim S. Pedersen

Journal of Mathematical Imaging and Vision, 46(1):103-120, Springer Netherlands, May 2013 (article)

Publishers site PDF [BibTex]

Publishers site PDF [BibTex]


no image
Unknown-color spherical object detection and tracking

Troppan, A., Guerreiro, E., Celiberti, F., Santos, G., Ahmad, A., Lima, P.

In pages: 1-4, IEEE, April 2013 (inproceedings)

Abstract
Detection and tracking of an unknown-color spherical object in a partially-known environment using a robot with a single camera is the core problem addressed in this article. A novel color detection mechanism, which exploits the geometrical properties of the spherical object's projection onto the image plane, precedes the object's detection process. A Kalman filter-based tracker uses the object detection in its update step and tracks the spherical object. Real robot experimental evaluation of the proposed method is presented on soccer robots detecting and tracking an unknown-color ball.

DOI [BibTex]

DOI [BibTex]


Quasi-Newton Methods: A New Direction
Quasi-Newton Methods: A New Direction

Hennig, P., Kiefel, M.

Journal of Machine Learning Research, 14(1):843-865, March 2013 (article)

Abstract
Four decades after their invention, quasi-Newton methods are still state of the art in unconstrained numerical optimization. Although not usually interpreted thus, these are learning algorithms that fit a local quadratic approximation to the objective function. We show that many, including the most popular, quasi-Newton methods can be interpreted as approximations of Bayesian linear regression under varying prior assumptions. This new notion elucidates some shortcomings of classical algorithms, and lights the way to a novel nonparametric quasi-Newton method, which is able to make more efficient use of available information at computational cost similar to its predecessors.

website+code pdf link (url) [BibTex]

website+code pdf link (url) [BibTex]


Simple, fast, accurate melanocytic lesion segmentation in 1D colour space
Simple, fast, accurate melanocytic lesion segmentation in 1D colour space

Peruch, F., Bogo, F., Bonazza, M., Bressan, M., Cappelleri, V., Peserico, E.

In VISAPP (1), pages: 191-200, Barcelona, February 2013 (inproceedings)

pdf [BibTex]

pdf [BibTex]


Estimating Human Pose with Flowing Puppets
Estimating Human Pose with Flowing Puppets

Zuffi, S., Romero, J., Schmid, C., Black, M. J.

In IEEE International Conference on Computer Vision (ICCV), pages: 3312-3319, 2013 (inproceedings)

Abstract
We address the problem of upper-body human pose estimation in uncontrolled monocular video sequences, without manual initialization. Most current methods focus on isolated video frames and often fail to correctly localize arms and hands. Inferring pose over a video sequence is advantageous because poses of people in adjacent frames exhibit properties of smooth variation due to the nature of human and camera motion. To exploit this, previous methods have used prior knowledge about distinctive actions or generic temporal priors combined with static image likelihoods to track people in motion. Here we take a different approach based on a simple observation: Information about how a person moves from frame to frame is present in the optical flow field. We develop an approach for tracking articulated motions that "links" articulated shape models of people in adjacent frames trough the dense optical flow. Key to this approach is a 2D shape model of the body that we use to compute how the body moves over time. The resulting "flowing puppets" provide a way of integrating image evidence across frames to improve pose inference. We apply our method on a challenging dataset of TV video sequences and show state-of-the-art performance.

pdf code data DOI Project Page Project Page Project Page [BibTex]

pdf code data DOI Project Page Project Page Project Page [BibTex]


Simultaneous Cast Shadows, Illumination and Geometry Inference Using   Hypergraphs
Simultaneous Cast Shadows, Illumination and Geometry Inference Using Hypergraphs

Panagopoulos, A., Wang, C., Samaras, D., Paragios, N.

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(2):437-449, 2013 (article)

pdf [BibTex]

pdf [BibTex]


no image
Right Ventricle Segmentation by Temporal Information Constrained Gradient Vector Flow

X. Yang, S. Y. Yeo, Y. Su, C. Lim, M. Wan, L. Zhong, R. S. Tan

In IEEE International Conference on Systems, Man, and Cybernetics, 2013 (inproceedings)

Abstract
Evaluation of right ventricular (RV) structure and function is of importance in the management of most cardiac disorders. But the segmentation of RV has always been consid- ered challenging due to low contrast of the myocardium with surrounding and high shape variability of the RV. In this paper, we present a 2D + T active contour model for segmentation and tracking of RV endocardium on cardiac magnetic resonance (MR) images. To take into account the temporal information between adjacent frames, we propose to integrate the time-dependent constraints into the energy functional of the classical gradient vector flow (GVF). As a result, the prior motion knowledge of RV is introduced in the deformation process through the time-dependent constraints in the proposed GVF-T model. A weighting parameter is introduced to adjust the weight of the temporal information against the image data itself. The additional external edge forces retrieved from the temporal constraints may be useful for the RV segmentation, such that lead to a better segmentation performance. The effectiveness of the proposed approach is supported by experimental results on synthetic and cardiac MR images.

[BibTex]

[BibTex]


A Comparison of Directional Distances for Hand Pose Estimation
A Comparison of Directional Distances for Hand Pose Estimation

Tzionas, D., Gall, J.

In German Conference on Pattern Recognition (GCPR), 8142, pages: 131-141, Lecture Notes in Computer Science, (Editors: Weickert, Joachim and Hein, Matthias and Schiele, Bernt), Springer, 2013 (inproceedings)

Abstract
Benchmarking methods for 3d hand tracking is still an open problem due to the difficulty of acquiring ground truth data. We introduce a new dataset and benchmarking protocol that is insensitive to the accumulative error of other protocols. To this end, we create testing frame pairs of increasing difficulty and measure the pose estimation error separately for each of them. This approach gives new insights and allows to accurately study the performance of each feature or method without employing a full tracking pipeline. Following this protocol, we evaluate various directional distances in the context of silhouette-based 3d hand tracking, expressed as special cases of a generalized Chamfer distance form. An appropriate parameter setup is proposed for each of them, and a comparative study reveals the best performing method in this context.

pdf Supplementary Project Page link (url) DOI Project Page [BibTex]

pdf Supplementary Project Page link (url) DOI Project Page [BibTex]


{Dynamic Probabilistic Volumetric Models}
Dynamic Probabilistic Volumetric Models

Ulusoy, A. O., Biris, O., Mundy, J. L.

In ICCV, pages: 505-512, 2013 (inproceedings)

Abstract
This paper presents a probabilistic volumetric framework for image based modeling of general dynamic 3-d scenes. The framework is targeted towards high quality modeling of complex scenes evolving over thousands of frames. Extensive storage and computational resources are required in processing large scale space-time (4-d) data. Existing methods typically store separate 3-d models at each time step and do not address such limitations. A novel 4-d representation is proposed that adaptively subdivides in space and time to explain the appearance of 3-d dynamic surfaces. This representation is shown to achieve compression of 4-d data and provide efficient spatio-temporal processing. The advances of the proposed framework is demonstrated on standard datasets using free-viewpoint video and 3-d tracking applications.

video pdf DOI [BibTex]

video pdf DOI [BibTex]


Model Reconstruction of Patient-Specific Cardiac Mesh from Segmented Contour Lines
Model Reconstruction of Patient-Specific Cardiac Mesh from Segmented Contour Lines

C. W. Lim, Y. Su, S. Y. Yeo, G. M. Ng, V. T. Nguyen, L. Zhong, R. S. Tan, K. K. Poh, P. Chai,

In Asia Pacific Congress on Computational Mechanics, 2013 (inproceedings)

Abstract
We propose an automatic algorithm for the reconstruction of a set of patient-specific dynamic cardiac mesh model with 1-to-1 mesh correspondence over the whole cardiac cycle. This work focus on both the reconstruction technique of the initial 3D model of the heart and also the consistent mapping of the vertex positions throughout all the 3D meshes. This process is technically more challenging due to the wide interval spacing between MRI images as compared to CT images, making overlapping blood vessels much harder to discern. We propose a tree-based connectivity data structure to perform a filtering process to eliminate weak connections between contours on adjacent slices. The reconstructed 3D model from the first time step is used as a base template model, and deformed to fit the segmented contours in the next time step. Our algorithm has been tested on an actual acquisition of cardiac MRI images over one cardiac cycle.

[BibTex]

[BibTex]


A Generic Deformation Model for Dense Non-Rigid Surface Registration: a Higher-Order MRF-based Approach
A Generic Deformation Model for Dense Non-Rigid Surface Registration: a Higher-Order MRF-based Approach

Zeng, Y., Wang, C., Gu, X., Samaras, D., Paragios, N.

In IEEE International Conference on Computer Vision (ICCV), pages: 3360~3367, 2013 (inproceedings)

pdf [BibTex]

pdf [BibTex]


Nonlinearly Constrained {MRFs}: Exploring the Intrinsic Dimensions   of Higher-Order Cliques
Nonlinearly Constrained MRFs: Exploring the Intrinsic Dimensions of Higher-Order Cliques

Zeng, Y., Wang, C., Soatto, S., Yau, S.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013 (inproceedings)

pdf [BibTex]

pdf [BibTex]


Random Forests for Real Time {3D} Face Analysis
Random Forests for Real Time 3D Face Analysis

Fanelli, G., Dantone, M., Gall, J., Fossati, A., van Gool, L.

International Journal of Computer Vision, 101(3):437-458, Springer, 2013 (article)

Abstract
We present a random forest-based framework for real time head pose estimation from depth images and extend it to localize a set of facial features in 3D. Our algorithm takes a voting approach, where each patch extracted from the depth image can directly cast a vote for the head pose or each of the facial features. Our system proves capable of handling large rotations, partial occlusions, and the noisy depth data acquired using commercial sensors. Moreover, the algorithm works on each frame independently and achieves real time performance without resorting to parallel computations on a GPU. We present extensive experiments on publicly available, challenging datasets and present a new annotated head pose database recorded using a Microsoft Kinect.

data and code publisher's site pdf DOI Project Page [BibTex]

data and code publisher's site pdf DOI Project Page [BibTex]


Markerless Motion Capture of Multiple Characters Using Multi-view Image Segmentation
Markerless Motion Capture of Multiple Characters Using Multi-view Image Segmentation

Liu, Y., Gall, J., Stoll, C., Dai, Q., Seidel, H., Theobalt, C.

Transactions on Pattern Analysis and Machine Intelligence, 35(11):2720-2735, 2013 (article)

Abstract
Capturing the skeleton motion and detailed time-varying surface geometry of multiple, closely interacting peoples is a very challenging task, even in a multicamera setup, due to frequent occlusions and ambiguities in feature-to-person assignments. To address this task, we propose a framework that exploits multiview image segmentation. To this end, a probabilistic shape and appearance model is employed to segment the input images and to assign each pixel uniquely to one person. Given the articulated template models of each person and the labeled pixels, a combined optimization scheme, which splits the skeleton pose optimization problem into a local one and a lower dimensional global one, is applied one by one to each individual, followed with surface estimation to capture detailed nonrigid deformations. We show on various sequences that our approach can capture the 3D motion of humans accurately even if they move rapidly, if they wear wide apparel, and if they are engaged in challenging multiperson motions, including dancing, wrestling, and hugging.

data and video pdf DOI Project Page [BibTex]

data and video pdf DOI Project Page [BibTex]


Viewpoint and pose in body-form adaptation
Viewpoint and pose in body-form adaptation

Sekunova, A., Black, M., Parkinson, L., Barton, J. J. S.

Perception, 42(2):176-186, 2013 (article)

Abstract
Faces and bodies are complex structures, perception of which can play important roles in person identification and inference of emotional state. Face representations have been explored using behavioural adaptation: in particular, studies have shown that face aftereffects show relatively broad tuning for viewpoint, consistent with origin in a high-level structural descriptor far removed from the retinal image. Our goals were to determine first, if body aftereffects also showed a degree of viewpoint invariance, and second if they also showed pose invariance, given that changes in pose create even more dramatic changes in the 2-D retinal image. We used a 3-D model of the human body to generate headless body images, whose parameters could be varied to generate different body forms, viewpoints, and poses. In the first experiment, subjects adapted to varying viewpoints of either slim or heavy bodies in a neutral stance, followed by test stimuli that were all front-facing. In the second experiment, we used the same front-facing bodies in neutral stance as test stimuli, but compared adaptation from bodies in the same neutral stance to adaptation with the same bodies in different poses. We found that body aftereffects were obtained over substantial viewpoint changes, with no significant decline in aftereffect magnitude with increasing viewpoint difference between adapting and test images. Aftereffects also showed transfer across one change in pose but not across another. We conclude that body representations may have more viewpoint invariance than faces, and demonstrate at least some transfer across pose, consistent with a high-level structural description. Keywords: aftereffect, shape, face, representation

pdf from publisher abstract pdf link (url) Project Page [BibTex]

pdf from publisher abstract pdf link (url) Project Page [BibTex]


Reconstructing patient-specific cardiac models from contours via Delaunay triangulation and graph-cuts
Reconstructing patient-specific cardiac models from contours via Delaunay triangulation and graph-cuts

Min Wan, Calvin Lim, Junmei Zhang, Yi Su, Si Yong Yeo, Desheng Wang, Ru San Tan, Liang Zhong

In International Conference of the IEEE Engineering in Medicine and Biology Society, pages: 2976-9, 2013 (inproceedings)

[BibTex]

[BibTex]


Regional comparison of left ventricle systolic wall stress reveals intraregional uniformity in healthy subjects
Regional comparison of left ventricle systolic wall stress reveals intraregional uniformity in healthy subjects

Soo Kng Teo, Si Yong Yeo, May Ling Tan, Chi Wan Lim, Liang Zhong, Ru San Tan, Yi Su

In Computing in Cardiology Conference, pages: 575 - 578, 2013 (inproceedings)

Abstract
This study aimed to assess the feasibility of using the regional uniformity of the left ventricle (LV) wall stress (WS) to diagnose patients with myocardial infarction. We present a novel method using a similarity map that measures the degree of uniformity in nominal systolic WS across pairs of segments within the same patient. The values of the nominal WS are computed at each vertex point from a 1-to-1 corresponding mesh pair of the LV at the end-diastole (ED) and end-systole (ES) phases. The 3D geometries of the LV at ED and ES are reconstructed from border-delineated MRI images and the 1-to-1 mesh generated using a strain-energy minimization approach. The LV is then partitioned into 16 segments based on published clinical standard and the nominal WS histogram distribution for each of the segment was computed. A similarity index is then computed for each pair of histogram distributions to generate a 16-by-16 similarity map. Based on our initial study involving 12 MI patients and 9 controls, we observed uniformity for intra- regional comparisons in the controls compared against the patients. Our results suggest that the regional uniformity of the nominal systolic WS in the form of a similarity map can potentially be used as a discriminant between MI patients and normal controls.

[BibTex]

[BibTex]


Non-parametric hand pose estimation with object context
Non-parametric hand pose estimation with object context

Romero, J., Kjellström, H., Ek, C. H., Kragic, D.

Image and Vision Computing , 31(8):555 - 564, 2013 (article)

Abstract
In the spirit of recent work on contextual recognition and estimation, we present a method for estimating the pose of human hands, employing information about the shape of the object in the hand. Despite the fact that most applications of human hand tracking involve grasping and manipulation of objects, the majority of methods in the literature assume a free hand, isolated from the surrounding environment. Occlusion of the hand from grasped objects does in fact often pose a severe challenge to the estimation of hand pose. In the presented method, object occlusion is not only compensated for, it contributes to the pose estimation in a contextual fashion; this without an explicit model of object shape. Our hand tracking method is non-parametric, performing a nearest neighbor search in a large database (.. entries) of hand poses with and without grasped objects. The system that operates in real time, is robust to self occlusions, object occlusions and segmentation errors, and provides full hand pose reconstruction from monocular video. Temporal consistency in hand pose is taken into account, without explicitly tracking the hand in the high-dim pose space. Experiments show the non-parametric method to outperform other state of the art regression methods, while operating at a significantly lower computational cost than comparable model-based hand tracking methods.

Publisher site pdf link (url) [BibTex]

Publisher site pdf link (url) [BibTex]

2012


Assessment of Computational Visual Attention Models on Medical Images
Assessment of Computational Visual Attention Models on Medical Images

Jampani, V., Ujjwal, , Sivaswamy, J., Vaidya, V.

Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing, pages: 80:1-80:8, ACM, Mumbai, India, December 2012 (conference)

Abstract
Visual attention plays a major role in our lives. Our very perception (which very much decides our survival) depends on it - like perceiving a predator while walking through a forest, perceiving a fast car coming from the front on a busy road or even spotting our favorite color out of the many colors. In Medical Imaging, where medical experts have to take major clinical decisions based on the examination of images of various kinds (CT, MRI etc), visual attention plays a pivotal role. It makes the medical experts fixate on any abnormal behavior exhibited in the medical image and helps in speedy diagnosis. Many previous works (see the paper for details) have exhibited this important fact and the model proposed by Nodine and Kundel highlights the important role of visual attention in medical image diagnosis. Visual attention involves two components - Bottom-Up and Top-Down.In the present work, we examine a number of established computational models of visual attention in the context of chest X-rays (infected with Pneumoconiosis) and retinal images (having hard exudates). The fundamental motivation is to try to understand the applicability of visual attention models in the context of different types of abnormalities. Our assessment of four popular visual attention models, is extensive and shows that they are able to pick up abnormal features reasonably well. We compare the models towards detecting subtle abnormalities and high-contrast lesions. Although significant scope of improvements exists especially in picking up more subtle abnormalities and getting more selective towards picking up more abnormalities and less normal structures, the presented assessment shows that visual attention indeed shows a promise for inclusion in the main field of medical image analysis

url pdf poster link (url) [BibTex]

2012

url pdf poster link (url) [BibTex]


An SVD-Based Approach for Ghost Detection and Removal in High Dynamic Range Images
An SVD-Based Approach for Ghost Detection and Removal in High Dynamic Range Images

Srikantha, A., Sidibe, D., Meriaudeau, F.

International Conference on Pattern Recognition (ICPR), pages: 380-383, November 2012 (article)

pdf [BibTex]

pdf [BibTex]


Lie Bodies: A Manifold Representation of {3D} Human Shape
Lie Bodies: A Manifold Representation of 3D Human Shape

Freifeld, O., Black, M. J.

In European Conf. on Computer Vision (ECCV), pages: 1-14, Part I, LNCS 7572, (Editors: A. Fitzgibbon et al. (Eds.)), Springer-Verlag, October 2012 (inproceedings)

Abstract
Three-dimensional object shape is commonly represented in terms of deformations of a triangular mesh from an exemplar shape. Existing models, however, are based on a Euclidean representation of shape deformations. In contrast, we argue that shape has a manifold structure: For example, summing the shape deformations for two people does not necessarily yield a deformation corresponding to a valid human shape, nor does the Euclidean difference of these two deformations provide a meaningful measure of shape dissimilarity. Consequently, we define a novel manifold for shape representation, with emphasis on body shapes, using a new Lie group of deformations. This has several advantages. First we define triangle deformations exactly, removing non-physical deformations and redundant degrees of freedom common to previous methods. Second, the Riemannian structure of Lie Bodies enables a more meaningful definition of body shape similarity by measuring distance between bodies on the manifold of body shape deformations. Third, the group structure allows the valid composition of deformations. This is important for models that factor body shape deformations into multiple causes or represent shape as a linear combination of basis shapes. Finally, body shape variation is modeled using statistics on manifolds. Instead of modeling Euclidean shape variation with Principal Component Analysis we capture shape variation on the manifold using Principal Geodesic Analysis. Our experiments show consistent visual and quantitative advantages of Lie Bodies over traditional Euclidean models of shape deformation and our representation can be easily incorporated into existing methods.

pdf supplemental material youtube poster eigenshape video code Project Page Project Page Project Page [BibTex]

pdf supplemental material youtube poster eigenshape video code Project Page Project Page Project Page [BibTex]


Coregistration: Simultaneous alignment and modeling of articulated {3D} shape
Coregistration: Simultaneous alignment and modeling of articulated 3D shape

Hirshberg, D., Loper, M., Rachlin, E., Black, M.

In European Conf. on Computer Vision (ECCV), pages: 242-255, LNCS 7577, Part IV, (Editors: A. Fitzgibbon et al. (Eds.)), Springer-Verlag, October 2012 (inproceedings)

Abstract
Three-dimensional (3D) shape models are powerful because they enable the inference of object shape from incomplete, noisy, or ambiguous 2D or 3D data. For example, realistic parameterized 3D human body models have been used to infer the shape and pose of people from images. To train such models, a corpus of 3D body scans is typically brought into registration by aligning a common 3D human-shaped template to each scan. This is an ill-posed problem that typically involves solving an optimization problem with regularization terms that penalize implausible deformations of the template. When aligning a corpus, however, we can do better than generic regularization. If we have a model of how the template can deform then alignments can be regularized by this model. Constructing a model of deformations, however, requires having a corpus that is already registered. We address this chicken-and-egg problem by approaching modeling and registration together. By minimizing a single objective function, we reliably obtain high quality registration of noisy, incomplete, laser scans, while simultaneously learning a highly realistic articulated body model. The model greatly improves robustness to noise and missing data. Since the model explains a corpus of body scans, it captures how body shape varies across people and poses.

pdf publisher site poster supplemental material (400MB) Project Page Project Page [BibTex]

pdf publisher site poster supplemental material (400MB) Project Page Project Page [BibTex]


Coupled Action Recognition and Pose Estimation from Multiple Views
Coupled Action Recognition and Pose Estimation from Multiple Views

Yao, A., Gall, J., van Gool, L.

International Journal of Computer Vision, 100(1):16-37, October 2012 (article)

publisher's site code pdf Project Page Project Page Project Page [BibTex]

publisher's site code pdf Project Page Project Page Project Page [BibTex]


Lessons and insights from creating a synthetic optical flow benchmark
Lessons and insights from creating a synthetic optical flow benchmark

Wulff, J., Butler, D. J., Stanley, G. B., Black, M. J.

In ECCV Workshop on Unsolved Problems in Optical Flow and Stereo Estimation, pages: 168-177, Part II, LNCS 7584, (Editors: A. Fusiello et al. (Eds.)), Springer-Verlag, October 2012 (inproceedings)

pdf dataset poster youtube Project Page [BibTex]

pdf dataset poster youtube Project Page [BibTex]


3D2PM {--} 3D Deformable Part Models
3D2PM – 3D Deformable Part Models

Pepik, B., Gehler, P., Stark, M., Schiele, B.

In Proceedings of the European Conference on Computer Vision (ECCV), pages: 356-370, Lecture Notes in Computer Science, (Editors: Fitzgibbon, Andrew W. and Lazebnik, Svetlana and Perona, Pietro and Sato, Yoichi and Schmid, Cordelia), Springer, Firenze, October 2012 (inproceedings)

pdf video poster Project Page [BibTex]

pdf video poster Project Page [BibTex]


A naturalistic open source movie for optical flow evaluation
A naturalistic open source movie for optical flow evaluation

Butler, D. J., Wulff, J., Stanley, G. B., Black, M. J.

In European Conf. on Computer Vision (ECCV), pages: 611-625, Part IV, LNCS 7577, (Editors: A. Fitzgibbon et al. (Eds.)), Springer-Verlag, October 2012 (inproceedings)

Abstract
Ground truth optical flow is difficult to measure in real scenes with natural motion. As a result, optical flow data sets are restricted in terms of size, complexity, and diversity, making optical flow algorithms difficult to train and test on realistic data. We introduce a new optical flow data set derived from the open source 3D animated short film Sintel. This data set has important features not present in the popular Middlebury flow evaluation: long sequences, large motions, specular reflections, motion blur, defocus blur, and atmospheric effects. Because the graphics data that generated the movie is open source, we are able to render scenes under conditions of varying complexity to evaluate where existing flow algorithms fail. We evaluate several recent optical flow algorithms and find that current highly-ranked methods on the Middlebury evaluation have difficulty with this more complex data set suggesting further research on optical flow estimation is needed. To validate the use of synthetic data, we compare the image- and flow-statistics of Sintel to those of real films and videos and show that they are similar. The data set, metrics, and evaluation website are publicly available.

pdf dataset youtube talk supplemental material Project Page Project Page [BibTex]

pdf dataset youtube talk supplemental material Project Page Project Page [BibTex]


{Characterization of 3-D Volumetric Probabilistic Scenes for Object Recognition}
Characterization of 3-D Volumetric Probabilistic Scenes for Object Recognition

Restrepo, M. I., Mayer, B. A., Ulusoy, A. O., Mundy, J. L.

In Selected Topics in Signal Processing, IEEE Journal of, 6(5):522-537, September 2012 (inproceedings)

Abstract
This paper presents a new volumetric representation for categorizing objects in large-scale 3-D scenes reconstructed from image sequences. This work uses a probabilistic volumetric model (PVM) that combines the ideas of background modeling and volumetric multi-view reconstruction to handle the uncertainty inherent in the problem of reconstructing 3-D structures from 2-D images. The advantages of probabilistic modeling have been demonstrated by recent application of the PVM representation to video image registration, change detection and classification of changes based on PVM context. The applications just mentioned, operate on 2-D projections of the PVM. This paper presents the first work to characterize and use the local 3-D information in the scenes. Two approaches to local feature description are proposed and compared: 1) features derived from a PCA analysis of model neighborhoods; and 2) features derived from the coefficients of a 3-D Taylor series expansion within each neighborhood. The resulting description is used in a bag-of-features approach to classify buildings, houses, cars, planes, and parking lots learned from aerial imagery collected over Providence, RI. It is shown that both feature descriptions explain the data with similar accuracy and their effectiveness for dense-feature categorization is compared for the different classes. Finally, 3-D extensions of the Harris corner detector and a Hessian-based detector are used to detect salient features. Both types of salient features are evaluated through object categorization experiments, where only features with maximal response are retained. For most saliency criteria tested, features based on the determinant of the Hessian achieved higher classification accuracy than Harris-based features.

pdf DOI [BibTex]

pdf DOI [BibTex]


A framework for relating neural activity to freely moving behavior
A framework for relating neural activity to freely moving behavior

Foster, J. D., Nuyujukian, P., Freifeld, O., Ryu, S., Black, M. J., Shenoy, K. V.

In 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC’12), pages: 2736 -2739 , IEEE, San Diego, August 2012 (inproceedings)

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Pottics {--} The Potts Topic Model for Semantic Image Segmentation
Pottics – The Potts Topic Model for Semantic Image Segmentation

Dann, C., Gehler, P., Roth, S., Nowozin, S.

In Proceedings of 34th DAGM Symposium, pages: 397-407, Lecture Notes in Computer Science, (Editors: Pinz, Axel and Pock, Thomas and Bischof, Horst and Leberl, Franz), Springer, August 2012 (inproceedings)

code pdf poster [BibTex]

code pdf poster [BibTex]


Psoriasis segmentation through chromatic regions and Geometric Active Contours
Psoriasis segmentation through chromatic regions and Geometric Active Contours

Bogo, F., Samory, M., Belloni Fortina, A., Piaserico, S., Peserico, E.

In 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC’12), pages: 5388-5391, San Diego, August 2012 (inproceedings)

pdf [BibTex]

pdf [BibTex]


PCA-enhanced stochastic optimization methods
PCA-enhanced stochastic optimization methods

Kuznetsova, A., Pons-Moll, G., Rosenhahn, B.

In German Conference on Pattern Recognition (GCPR), August 2012 (inproceedings)

pdf [BibTex]

pdf [BibTex]


Quasi-Newton Methods: A New Direction
Quasi-Newton Methods: A New Direction

Hennig, P., Kiefel, M.

In Proceedings of the 29th International Conference on Machine Learning, pages: 25-32, ICML ’12, (Editors: John Langford and Joelle Pineau), Omnipress, New York, NY, USA, July 2012 (inproceedings)

Abstract
Four decades after their invention, quasi- Newton methods are still state of the art in unconstrained numerical optimization. Although not usually interpreted thus, these are learning algorithms that fit a local quadratic approximation to the objective function. We show that many, including the most popular, quasi-Newton methods can be interpreted as approximations of Bayesian linear regression under varying prior assumptions. This new notion elucidates some shortcomings of classical algorithms, and lights the way to a novel nonparametric quasi-Newton method, which is able to make more efficient use of available information at computational cost similar to its predecessors.

website+code pdf link (url) [BibTex]

website+code pdf link (url) [BibTex]


{DRAPE: DRessing Any PErson}
DRAPE: DRessing Any PErson

Guan, P., Reiss, L., Hirshberg, D., Weiss, A., Black, M. J.

ACM Trans. on Graphics (Proc. SIGGRAPH), 31(4):35:1-35:10, July 2012 (article)

Abstract
We describe a complete system for animating realistic clothing on synthetic bodies of any shape and pose without manual intervention. The key component of the method is a model of clothing called DRAPE (DRessing Any PErson) that is learned from a physics-based simulation of clothing on bodies of different shapes and poses. The DRAPE model has the desirable property of "factoring" clothing deformations due to body shape from those due to pose variation. This factorization provides an approximation to the physical clothing deformation and greatly simplifies clothing synthesis. Given a parameterized model of the human body with known shape and pose parameters, we describe an algorithm that dresses the body with a garment that is customized to fit and possesses realistic wrinkles. DRAPE can be used to dress static bodies or animated sequences with a learned model of the cloth dynamics. Since the method is fully automated, it is appropriate for dressing large numbers of virtual characters of varying shape. The method is significantly more efficient than physical simulation.

YouTube pdf talk Project Page Project Page [BibTex]

YouTube pdf talk Project Page Project Page [BibTex]


Learning Search Based Inference for Object Detection
Learning Search Based Inference for Object Detection

Gehler, P., Lehmann, A.

In International Conference on Machine Learning (ICML) workshop on Inferning: Interactions between Inference and Learning, Edinburgh, Scotland, UK, July 2012, short version of BMVC11 paper (http://ps.is.tue.mpg.de/publications/31/get_file) (inproceedings)

pdf [BibTex]

pdf [BibTex]


Ghost Detection and Removal for High Dynamic Range Images: Recent Advances
Ghost Detection and Removal for High Dynamic Range Images: Recent Advances

Srikantha, A., Sidib’e, D.

Signal Processing: Image Communication, 27, pages: 650-662, July 2012 (article)

pdf link (url) [BibTex]

pdf link (url) [BibTex]


Distribution Fields for Tracking
Distribution Fields for Tracking

Sevilla-Lara, L., Learned-Miller, E.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, June 2012 (inproceedings)

Abstract
Visual tracking of general objects often relies on the assumption that gradient descent of the alignment function will reach the global optimum. A common technique to smooth the objective function is to blur the image. However, blurring the image destroys image information, which can cause the target to be lost. To address this problem we introduce a method for building an image descriptor using distribution fields (DFs), a representation that allows smoothing the objective function without destroying information about pixel values. We present experimental evidence on the superiority of the width of the basin of attraction around the global optimum of DFs over other descriptors. DFs also allow the representation of uncertainty about the tracked object. This helps in disregarding outliers during tracking (like occlusions or small misalignments) without modeling them explicitly. Finally, this provides a convenient way to aggregate the observations of the object through time and maintain an updated model. We present a simple tracking algorithm that uses DFs and obtains state-of-the-art results on standard benchmarks.

pdf Matlab code [BibTex]

pdf Matlab code [BibTex]


From pictorial structures to deformable structures
From pictorial structures to deformable structures

Zuffi, S., Freifeld, O., Black, M. J.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 3546-3553, IEEE, June 2012 (inproceedings)

Abstract
Pictorial Structures (PS) define a probabilistic model of 2D articulated objects in images. Typical PS models assume an object can be represented by a set of rigid parts connected with pairwise constraints that define the prior probability of part configurations. These models are widely used to represent non-rigid articulated objects such as humans and animals despite the fact that such objects have parts that deform non-rigidly. Here we define a new Deformable Structures (DS) model that is a natural extension of previous PS models and that captures the non-rigid shape deformation of the parts. Each part in a DS model is represented by a low-dimensional shape deformation space and pairwise potentials between parts capture how the shape varies with pose and the shape of neighboring parts. A key advantage of such a model is that it more accurately models object boundaries. This enables image likelihood models that are more discriminative than previous PS likelihoods. This likelihood is learned using training imagery annotated using a DS “puppet.” We focus on a human DS model learned from 2D projections of a realistic 3D human body model and use it to infer human poses in images using a form of non-parametric belief propagation.

pdf sup mat code poster Project Page Project Page Project Page Project Page [BibTex]

pdf sup mat code poster Project Page Project Page Project Page Project Page [BibTex]


Teaching 3D Geometry to Deformable Part Models
Teaching 3D Geometry to Deformable Part Models

Pepik, B., Stark, M., Gehler, P., Schiele, B.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 3362 -3369, IEEE, Providence, RI, USA, June 2012, oral presentation (inproceedings)

pdf DOI Project Page [BibTex]

pdf DOI Project Page [BibTex]


Visual Servoing on Unknown Objects
Visual Servoing on Unknown Objects

Gratal, X., Romero, J., Bohg, J., Kragic, D.

Mechatronics, 22(4):423-435, Elsevier, June 2012, Visual Servoing \{SI\} (article)

Abstract
We study visual servoing in a framework of detection and grasping of unknown objects. Classically, visual servoing has been used for applications where the object to be servoed on is known to the robot prior to the task execution. In addition, most of the methods concentrate on aligning the robot hand with the object without grasping it. In our work, visual servoing techniques are used as building blocks in a system capable of detecting and grasping unknown objects in natural scenes. We show how different visual servoing techniques facilitate a complete grasping cycle.

Grasping sequence video Offline calibration video Pdf DOI [BibTex]

Grasping sequence video Offline calibration video Pdf DOI [BibTex]


Branch-and-price global optimization for multi-view multi-object tracking
Branch-and-price global optimization for multi-view multi-object tracking

Leal-Taixé, L., Pons-Moll, G., Rosenhahn, B.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2012 (inproceedings)

project page paper poster [BibTex]

project page paper poster [BibTex]


A physically-based approach to reflection separation
A physically-based approach to reflection separation

Kong, N., Tai, Y., Shin, S. Y.

In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 9-16, June 2012 (inproceedings)

Abstract
We propose a physically-based approach to separate reflection using multiple polarized images with a background scene captured behind glass. The input consists of three polarized images, each captured from the same view point but with a different polarizer angle separated by 45 degrees. The output is the high-quality separation of the reflection and background layers from each of the input images. A main technical challenge for this problem is that the mixing coefficient for the reflection and background layers depends on the angle of incidence and the orientation of the plane of incidence, which are spatially-varying over the pixels of an image. Exploiting physical properties of polarization for a double-surfaced glass medium, we propose an algorithm which automatically finds the optimal separation of the reflection and background layers. Thorough experiments, we demonstrate that our approach can generate superior results to those of previous methods.

Publisher site [BibTex]

Publisher site [BibTex]


Visual Orientation and Directional Selectivity Through Thalamic Synchrony
Visual Orientation and Directional Selectivity Through Thalamic Synchrony

Stanley, G., Jin, J., Wang, Y., Desbordes, G., Wang, Q., Black, M., Alonso, J.

Journal of Neuroscience, 32(26):9073-9088, June 2012 (article)

Abstract
Thalamic neurons respond to visual scenes by generating synchronous spike trains on the timescale of 10–20 ms that are very effective at driving cortical targets. Here we demonstrate that this synchronous activity contains unexpectedly rich information about fundamental properties of visual stimuli. We report that the occurrence of synchronous firing of cat thalamic cells with highly overlapping receptive fields is strongly sensitive to the orientation and the direction of motion of the visual stimulus. We show that this stimulus selectivity is robust, remaining relatively unchanged under different contrasts and temporal frequencies (stimulus velocities). A computational analysis based on an integrate-and-fire model of the direct thalamic input to a layer 4 cortical cell reveals a strong correlation between the degree of thalamic synchrony and the nonlinear relationship between cortical membrane potential and the resultant firing rate. Together, these findings suggest a novel population code in the synchronous firing of neurons in the early visual pathway that could serve as the substrate for establishing cortical representations of the visual scene.

preprint publisher's site Project Page [BibTex]

preprint publisher's site Project Page [BibTex]


Bilinear Spatiotemporal Basis Models
Bilinear Spatiotemporal Basis Models

Akhter, I., Simon, T., Khan, S., Matthews, I., Sheikh, Y.

ACM Transactions on Graphics (TOG), 31(2):17, ACM, April 2012 (article)

Abstract
A variety of dynamic objects, such as faces, bodies, and cloth, are represented in computer graphics as a collection of moving spatial landmarks. Spatiotemporal data is inherent in a number of graphics applications including animation, simulation, and object and camera tracking. The principal modes of variation in the spatial geometry of objects are typically modeled using dimensionality reduction techniques, while concurrently, trajectory representations like splines and autoregressive models are widely used to exploit the temporal regularity of deformation. In this article, we present the bilinear spatiotemporal basis as a model that simultaneously exploits spatial and temporal regularity while maintaining the ability to generalize well to new sequences. This factorization allows the use of analytical, predefined functions to represent temporal variation (e.g., B-Splines or the Discrete Cosine Transform) resulting in efficient model representation and estimation. The model can be interpreted as representing the data as a linear combination of spatiotemporal sequences consisting of shape modes oscillating over time at key frequencies. We apply the bilinear model to natural spatiotemporal phenomena, including face, body, and cloth motion data, and compare it in terms of compaction, generalization ability, predictive precision, and efficiency to existing models. We demonstrate the application of the model to a number of graphics tasks including labeling, gap-filling, denoising, and motion touch-up.

pdf project page link (url) [BibTex]

pdf project page link (url) [BibTex]


{High Resolution Surface Reconstruction from Multi-view Aerial Imagery}
High Resolution Surface Reconstruction from Multi-view Aerial Imagery

Calakli, F., Ulusoy, A. O., Restrepo, M. I., Taubin, G., Mundy, J. L.

In 3D Imaging Modeling Processing Visualization Transmission (3DIMPVT), pages: 25-32, IEEE, 2012 (inproceedings)

Abstract
This paper presents a novel framework for surface reconstruction from multi-view aerial imagery of large scale urban scenes, which combines probabilistic volumetric modeling with smooth signed distance surface estimation, to produce very detailed and accurate surfaces. Using a continuous probabilistic volumetric model which allows for explicit representation of ambiguities caused by moving objects, reflective surfaces, areas of constant appearance, and self-occlusions, the algorithm learns the geometry and appearance of a scene from a calibrated image sequence. An online implementation of Bayesian learning precess in GPUs significantly reduces the time required to process a large number of images. The probabilistic volumetric model of occupancy is subsequently used to estimate a smooth approximation of the signed distance function to the surface. This step, which reduces to the solution of a sparse linear system, is very efficient and scalable to large data sets. The proposed algorithm is shown to produce high quality surfaces in challenging aerial scenes where previous methods make large errors in surface localization. The general applicability of the algorithm beyond aerial imagery is confirmed against the Middlebury benchmark.

Video pdf link (url) DOI [BibTex]

Video pdf link (url) DOI [BibTex]