Header logo is ps


2015


Long Range Motion Estimation and Applications
Long Range Motion Estimation and Applications

Sevilla-Lara, L.

Long Range Motion Estimation and Applications, University of Massachusetts Amherst, University of Massachusetts Amherst, Febuary 2015 (phdthesis)

Abstract
Finding correspondences between images underlies many computer vision problems, such as optical flow, tracking, stereovision and alignment. Finding these correspondences involves formulating a matching function and optimizing it. This optimization process is often gradient descent, which avoids exhaustive search, but relies on the assumption of being in the basin of attraction of the right local minimum. This is often the case when the displacement is small, and current methods obtain very accurate results for small motions. However, when the motion is large and the matching function is bumpy this assumption is less likely to be true. One traditional way of avoiding this abruptness is to smooth the matching function spatially by blurring the images. As the displacement becomes larger, the amount of blur required to smooth the matching function becomes also larger. This averaging of pixels leads to a loss of detail in the image. Therefore, there is a trade-off between the size of the objects that can be tracked and the displacement that can be captured. In this thesis we address the basic problem of increasing the size of the basin of attraction in a matching function. We use an image descriptor called distribution fields (DFs). By blurring the images in DF space instead of in pixel space, we in- crease the size of the basin attraction with respect to traditional methods. We show competitive results using DFs both in object tracking and optical flow. Finally we demonstrate an application of capturing large motions for temporal video stitching.

[BibTex]

2015

[BibTex]


{Spike train SIMilarity Space} ({SSIMS}): A framework for single neuron and ensemble data analysis
Spike train SIMilarity Space (SSIMS): A framework for single neuron and ensemble data analysis

Vargas-Irwin, C. E., Brandman, D. M., Zimmermann, J. B., Donoghue, J. P., Black, M. J.

Neural Computation, 27(1):1-31, MIT Press, January 2015 (article)

Abstract
We present a method to evaluate the relative similarity of neural spiking patterns by combining spike train distance metrics with dimensionality reduction. Spike train distance metrics provide an estimate of similarity between activity patterns at multiple temporal resolutions. Vectors of pair-wise distances are used to represent the intrinsic relationships between multiple activity patterns at the level of single units or neuronal ensembles. Dimensionality reduction is then used to project the data into concise representations suitable for clustering analysis as well as exploratory visualization. Algorithm performance and robustness are evaluated using multielectrode ensemble activity data recorded in behaving primates. We demonstrate how Spike train SIMilarity Space (SSIMS) analysis captures the relationship between goal directions for an 8-directional reaching task and successfully segregates grasp types in a 3D grasping task in the absence of kinematic information. The algorithm enables exploration of virtually any type of neural spiking (time series) data, providing similarity-based clustering of neural activity states with minimal assumptions about potential information encoding models.

pdf: publisher site pdf: author's proof DOI Project Page [BibTex]

pdf: publisher site pdf: author's proof DOI Project Page [BibTex]


Efficient Facade Segmentation using Auto-Context
Efficient Facade Segmentation using Auto-Context

Jampani, V., Gadde, R., Gehler, P. V.

In Applications of Computer Vision (WACV), 2015 IEEE Winter Conference on, pages: 1038-1045, IEEE, January 2015 (inproceedings)

Abstract
In this paper we propose a system for the problem of facade segmentation. Building facades are highly structured images and consequently most methods that have been proposed for this problem, aim to make use of this strong prior information. We are describing a system that is almost domain independent and consists of standard segmentation methods. A sequence of boosted decision trees is stacked using auto-context features and learned using the stacked generalization technique. We find that this, albeit standard, technique performs better, or equals, all previous published empirical results on all available facade benchmark datasets. The proposed method is simple to implement, easy to extend, and very efficient at test time inference.

website pdf supplementary IEEE page link (url) DOI Project Page [BibTex]

website pdf supplementary IEEE page link (url) DOI Project Page [BibTex]


Norm-induced entropies for decision forests
Norm-induced entropies for decision forests

Lassner, C., Lienhart, R.

IEEE Winter Conference on Applications of Computer Vision (WACV), January 2015 (conference)

Abstract
The entropy measurement function is a central element of decision forest induction. The Shannon entropy and other generalized entropies such as the Renyi and Tsallis entropy are designed to fulfill the Khinchin-Shannon axioms. Whereas these axioms are appropriate for physical systems, they do not necessarily model well the artificial system of decision forest induction. In this paper, we show that when omitting two of the four axioms, every norm induces an entropy function. The remaining two axioms are sufficient to describe the requirements for an entropy function in the decision forest context. Furthermore, we introduce and analyze the p-norm-induced entropy, show relations to existing entropies and the relation to various heuristics that are commonly used for decision forest training. In experiments with classification, regression and the recently introduced Hough forests, we show how the discrete and differential form of the new entropy can be used for forest induction and how the functions can simply be fine-tuned. The experiments indicate that the impact of the entropy function is limited, however can be a simple and useful post-processing step for optimizing decision forests for high performance applications.

pdf code [BibTex]

pdf code [BibTex]


Dataset Suite for Benchmarking Perception in Robotics
Dataset Suite for Benchmarking Perception in Robotics

Ahmad, A., Lima, P.

In International Conference on Intelligent Robots and Systems (IROS) 2015, 2015 (inproceedings)

[BibTex]

[BibTex]


{FlowCap}: {2D} Human Pose from Optical Flow
FlowCap: 2D Human Pose from Optical Flow

Romero, J., Loper, M., Black, M. J.

In Pattern Recognition, Proc. 37th German Conference on Pattern Recognition (GCPR), LNCS 9358, pages: 412-423, Springer, 2015 (inproceedings)

Abstract
We estimate 2D human pose from video using only optical flow. The key insight is that dense optical flow can provide information about 2D body pose. Like range data, flow is largely invariant to appearance but unlike depth it can be directly computed from monocular video. We demonstrate that body parts can be detected from dense flow using the same random forest approach used by the Microsoft Kinect. Unlike range data, however, when people stop moving, there is no optical flow and they effectively disappear. To address this, our FlowCap method uses a Kalman filter to propagate body part positions and ve- locities over time and a regression method to predict 2D body pose from part centers. No range sensor is required and FlowCap estimates 2D human pose from monocular video sources containing human motion. Such sources include hand-held phone cameras and archival television video. We demonstrate 2D body pose estimation in a range of scenarios and show that the method works with real-time optical flow. The results suggest that optical flow shares invariances with range data that, when complemented with tracking, make it valuable for pose estimation.

video pdf preprint Project Page Project Page [BibTex]

video pdf preprint Project Page Project Page [BibTex]


Towards Optimal Robot Navigation in Urban Homes
Towards Optimal Robot Navigation in Urban Homes

Ventura, R., Ahmad, A.

In RoboCup 2014: Robot World Cup XVIII, pages: 318-331, Lecture Notes in Computer Science ; 8992, Springer, Cham, Switzerland, 2015 (inproceedings)

Abstract
The work presented in this paper is motivated by the goal of dependable autonomous navigation of mobile robots. This goal is a fundamental requirement for having autonomous robots in spaces such as domestic spaces and public establishments, left unattended by technical staff. In this paper we tackle this problem by taking an optimization approach: on one hand, we use a Fast Marching Approach for path planning, resulting in optimal paths in the absence of unmapped obstacles, and on the other hand we use a Dynamic Window Approach for guidance. To the best of our knowledge, the combination of these two methods is novel. We evaluate the approach on a real mobile robot, capable of moving at high speed. The evaluation makes use of an external ground truth system. We report controlled experiments that we performed, including the presence of people moving randomly nearby the robot. In our long term experiments we report a total distance of 18 km traveled during 11 hours of movement time.

DOI [BibTex]

DOI [BibTex]


Metric Regression Forests for Correspondence Estimation
Metric Regression Forests for Correspondence Estimation

Pons-Moll, G., Taylor, J., Shotton, J., Hertzmann, A., Fitzgibbon, A.

International Journal of Computer Vision, pages: 1-13, 2015 (article)

springer PDF Project Page [BibTex]

springer PDF Project Page [BibTex]


Joint 3D Object and Layout Inference from a single RGB-D Image
Joint 3D Object and Layout Inference from a single RGB-D Image

(Best Paper Award)

Geiger, A., Wang, C.

In German Conference on Pattern Recognition (GCPR), 9358, pages: 183-195, Lecture Notes in Computer Science, Springer International Publishing, 2015 (inproceedings)

Abstract
Inferring 3D objects and the layout of indoor scenes from a single RGB-D image captured with a Kinect camera is a challenging task. Towards this goal, we propose a high-order graphical model and jointly reason about the layout, objects and superpixels in the image. In contrast to existing holistic approaches, our model leverages detailed 3D geometry using inverse graphics and explicitly enforces occlusion and visibility constraints for respecting scene properties and projective geometry. We cast the task as MAP inference in a factor graph and solve it efficiently using message passing. We evaluate our method with respect to several baselines on the challenging NYUv2 indoor dataset using 21 object categories. Our experiments demonstrate that the proposed method is able to infer scenes with a large degree of clutter and occlusions.

pdf suppmat video project DOI [BibTex]

pdf suppmat video project DOI [BibTex]


3D Object Class Detection in the Wild
3D Object Class Detection in the Wild

Pepik, B., Stark, M., Gehler, P., Ritschel, T., Schiele, B.

In Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, 2015 (inproceedings)

Project Page [BibTex]

Project Page [BibTex]


Discrete Optimization for Optical Flow
Discrete Optimization for Optical Flow

Menze, M., Heipke, C., Geiger, A.

In German Conference on Pattern Recognition (GCPR), 9358, pages: 16-28, Springer International Publishing, 2015 (inproceedings)

Abstract
We propose to look at large-displacement optical flow from a discrete point of view. Motivated by the observation that sub-pixel accuracy is easily obtained given pixel-accurate optical flow, we conjecture that computing the integral part is the hardest piece of the problem. Consequently, we formulate optical flow estimation as a discrete inference problem in a conditional random field, followed by sub-pixel refinement. Naive discretization of the 2D flow space, however, is intractable due to the resulting size of the label set. In this paper, we therefore investigate three different strategies, each able to reduce computation and memory demands by several orders of magnitude. Their combination allows us to estimate large-displacement optical flow both accurately and efficiently and demonstrates the potential of discrete optimization for optical flow. We obtain state-of-the-art performance on MPI Sintel and KITTI.

pdf suppmat project DOI [BibTex]

pdf suppmat project DOI [BibTex]


Joint 3D Estimation of Vehicles and Scene Flow
Joint 3D Estimation of Vehicles and Scene Flow

Menze, M., Heipke, C., Geiger, A.

In Proc. of the ISPRS Workshop on Image Sequence Analysis (ISA), 2015 (inproceedings)

Abstract
Three-dimensional reconstruction of dynamic scenes is an important prerequisite for applications like mobile robotics or autonomous driving. While much progress has been made in recent years, imaging conditions in natural outdoor environments are still very challenging for current reconstruction and recognition methods. In this paper, we propose a novel unified approach which reasons jointly about 3D scene flow as well as the pose, shape and motion of vehicles in the scene. Towards this goal, we incorporate a deformable CAD model into a slanted-plane conditional random field for scene flow estimation and enforce shape consistency between the rendered 3D models and the parameters of all superpixels in the image. The association of superpixels to objects is established by an index variable which implicitly enables model selection. We evaluate our approach on the challenging KITTI scene flow dataset in terms of object and scene flow estimation. Our results provide a prove of concept and demonstrate the usefulness of our method.

PDF [BibTex]

PDF [BibTex]


A Setup for multi-UAV hardware-in-the-loop simulations
A Setup for multi-UAV hardware-in-the-loop simulations

Odelga, M., Stegagno, P., Bülthoff, H., Ahmad, A.

In pages: 204-210, IEEE, 2015 (inproceedings)

Abstract
In this paper, we present a hardware in the loop simulation setup for multi-UAV systems. With our setup, we are able to command the robots simulated in Gazebo, a popular open source ROS-enabled physical simulator, using the computational units that are embedded on our quadrotor UAVs. Hence, we can test in simulation not only the correct execution of algorithms, but also the computational feasibility directly on the robot hardware. In addition, since our setup is inherently multi-robot, we can also test the communication flow among the robots. We provide two use cases to show the characteristics of our setup.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Smooth Loops from Unconstrained Video
Smooth Loops from Unconstrained Video

Sevilla-Lara, L., Wulff, J., Sunkavalli, K., Shechtman, E.

In Computer Graphics Forum (Proceedings of EGSR), 34(4):99-107, 2015 (inproceedings)

Abstract
Converting unconstrained video sequences into videos that loop seamlessly is an extremely challenging problem. In this work, we take the first steps towards automating this process by focusing on an important subclass of videos containing a single dominant foreground object. Our technique makes two novel contributions over previous work: first, we propose a correspondence-based similarity metric to automatically identify a good transition point in the video where the appearance and dynamics of the foreground are most consistent. Second, we develop a technique that aligns both the foreground and background about this transition point using a combination of global camera path planning and patch-based video morphing. We demonstrate that this allows us to create natural, compelling, loopy videos from a wide range of videos collected from the internet.

pdf link (url) DOI Project Page [BibTex]

pdf link (url) DOI Project Page [BibTex]


Formation control driven by cooperative object tracking
Formation control driven by cooperative object tracking

Lima, P., Ahmad, A., Dias, A., Conceição, A., Moreira, A., Silva, E., Almeida, L., Oliveira, L., Nascimento, T.

Robotics and Autonomous Systems, 63(1):68-79, 2015 (article)

Abstract
In this paper we introduce a formation control loop that maximizes the performance of the cooperative perception of a tracked target by a team of mobile robots, while maintaining the team in formation, with a dynamically adjustable geometry which is a function of the quality of the target perception by the team. In the formation control loop, the controller module is a distributed non-linear model predictive controller and the estimator module fuses local estimates of the target state, obtained by a particle filter at each robot. The two modules and their integration are described in detail, including a real-time database associated to a wireless communication protocol that facilitates the exchange of state data while reducing collisions among team members. Simulation and real robot results for indoor and outdoor teams of different robots are presented. The results highlight how our method successfully enables a team of homogeneous robots to minimize the total uncertainty of the tracked target cooperative estimate while complying with performance criteria such as keeping a pre-set distance between the teammates and the target, avoiding collisions with teammates and/or surrounding obstacles.

DOI [BibTex]

DOI [BibTex]


Onboard robust person detection and tracking for domestic service robots
Onboard robust person detection and tracking for domestic service robots

Sanz, D., Ahmad, A., Lima, P.

In Robot 2015: Second Iberian Robotics Conference, pages: 547-559, Advances in Intelligent Systems and Computing ; 418, Springer, Cham, Switzerland, 2015 (inproceedings)

Abstract
Domestic assistance for the elderly and impaired people is one of the biggest upcoming challenges of our society. Consequently, in-home care through domestic service robots is identified as one of the most important application area of robotics research. Assistive tasks may range from visitor reception at the door to catering for owner's small daily necessities within a house. Since most of these tasks require the robot to interact directly with humans, a predominant robot functionality is to detect and track humans in real time: either the owner of the robot or visitors at home or both. In this article we present a robust method for such a functionality that combines depth-based segmentation and visual detection. The robustness of our method lies in its capability to not only identify partially occluded humans (e.g., with only torso visible) but also to do so in varying lighting conditions. We thoroughly validate our method through extensive experiments on real robot datasets and comparisons with the ground truth. The datasets were collected on a home-like environment set up within the context of RoboCup@Home and RoCKIn@Home competitions.

DOI [BibTex]

DOI [BibTex]

2012


Assessment of Computational Visual Attention Models on Medical Images
Assessment of Computational Visual Attention Models on Medical Images

Jampani, V., Ujjwal, , Sivaswamy, J., Vaidya, V.

Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing, pages: 80:1-80:8, ACM, Mumbai, India, December 2012 (conference)

Abstract
Visual attention plays a major role in our lives. Our very perception (which very much decides our survival) depends on it - like perceiving a predator while walking through a forest, perceiving a fast car coming from the front on a busy road or even spotting our favorite color out of the many colors. In Medical Imaging, where medical experts have to take major clinical decisions based on the examination of images of various kinds (CT, MRI etc), visual attention plays a pivotal role. It makes the medical experts fixate on any abnormal behavior exhibited in the medical image and helps in speedy diagnosis. Many previous works (see the paper for details) have exhibited this important fact and the model proposed by Nodine and Kundel highlights the important role of visual attention in medical image diagnosis. Visual attention involves two components - Bottom-Up and Top-Down.In the present work, we examine a number of established computational models of visual attention in the context of chest X-rays (infected with Pneumoconiosis) and retinal images (having hard exudates). The fundamental motivation is to try to understand the applicability of visual attention models in the context of different types of abnormalities. Our assessment of four popular visual attention models, is extensive and shows that they are able to pick up abnormal features reasonably well. We compare the models towards detecting subtle abnormalities and high-contrast lesions. Although significant scope of improvements exists especially in picking up more subtle abnormalities and getting more selective towards picking up more abnormalities and less normal structures, the presented assessment shows that visual attention indeed shows a promise for inclusion in the main field of medical image analysis

url pdf poster link (url) [BibTex]

2012

url pdf poster link (url) [BibTex]


Virtual Human Bodies with Clothing and Hair: From Images to Animation
Virtual Human Bodies with Clothing and Hair: From Images to Animation

Guan, P.

Brown University, Department of Computer Science, December 2012 (phdthesis)

pdf [BibTex]

pdf [BibTex]


An SVD-Based Approach for Ghost Detection and Removal in High Dynamic Range Images
An SVD-Based Approach for Ghost Detection and Removal in High Dynamic Range Images

Srikantha, A., Sidibe, D., Meriaudeau, F.

International Conference on Pattern Recognition (ICPR), pages: 380-383, November 2012 (article)

pdf [BibTex]

pdf [BibTex]


Coregistration: Supplemental Material
Coregistration: Supplemental Material

Hirshberg, D., Loper, M., Rachlin, E., Black, M. J.

(No. 4), Max Planck Institute for Intelligent Systems, October 2012 (techreport)

pdf [BibTex]

pdf [BibTex]


Lie Bodies: A Manifold Representation of {3D} Human Shape
Lie Bodies: A Manifold Representation of 3D Human Shape

Freifeld, O., Black, M. J.

In European Conf. on Computer Vision (ECCV), pages: 1-14, Part I, LNCS 7572, (Editors: A. Fitzgibbon et al. (Eds.)), Springer-Verlag, October 2012 (inproceedings)

Abstract
Three-dimensional object shape is commonly represented in terms of deformations of a triangular mesh from an exemplar shape. Existing models, however, are based on a Euclidean representation of shape deformations. In contrast, we argue that shape has a manifold structure: For example, summing the shape deformations for two people does not necessarily yield a deformation corresponding to a valid human shape, nor does the Euclidean difference of these two deformations provide a meaningful measure of shape dissimilarity. Consequently, we define a novel manifold for shape representation, with emphasis on body shapes, using a new Lie group of deformations. This has several advantages. First we define triangle deformations exactly, removing non-physical deformations and redundant degrees of freedom common to previous methods. Second, the Riemannian structure of Lie Bodies enables a more meaningful definition of body shape similarity by measuring distance between bodies on the manifold of body shape deformations. Third, the group structure allows the valid composition of deformations. This is important for models that factor body shape deformations into multiple causes or represent shape as a linear combination of basis shapes. Finally, body shape variation is modeled using statistics on manifolds. Instead of modeling Euclidean shape variation with Principal Component Analysis we capture shape variation on the manifold using Principal Geodesic Analysis. Our experiments show consistent visual and quantitative advantages of Lie Bodies over traditional Euclidean models of shape deformation and our representation can be easily incorporated into existing methods.

pdf supplemental material youtube poster eigenshape video code Project Page Project Page Project Page [BibTex]

pdf supplemental material youtube poster eigenshape video code Project Page Project Page Project Page [BibTex]


Coregistration: Simultaneous alignment and modeling of articulated {3D} shape
Coregistration: Simultaneous alignment and modeling of articulated 3D shape

Hirshberg, D., Loper, M., Rachlin, E., Black, M.

In European Conf. on Computer Vision (ECCV), pages: 242-255, LNCS 7577, Part IV, (Editors: A. Fitzgibbon et al. (Eds.)), Springer-Verlag, October 2012 (inproceedings)

Abstract
Three-dimensional (3D) shape models are powerful because they enable the inference of object shape from incomplete, noisy, or ambiguous 2D or 3D data. For example, realistic parameterized 3D human body models have been used to infer the shape and pose of people from images. To train such models, a corpus of 3D body scans is typically brought into registration by aligning a common 3D human-shaped template to each scan. This is an ill-posed problem that typically involves solving an optimization problem with regularization terms that penalize implausible deformations of the template. When aligning a corpus, however, we can do better than generic regularization. If we have a model of how the template can deform then alignments can be regularized by this model. Constructing a model of deformations, however, requires having a corpus that is already registered. We address this chicken-and-egg problem by approaching modeling and registration together. By minimizing a single objective function, we reliably obtain high quality registration of noisy, incomplete, laser scans, while simultaneously learning a highly realistic articulated body model. The model greatly improves robustness to noise and missing data. Since the model explains a corpus of body scans, it captures how body shape varies across people and poses.

pdf publisher site poster supplemental material (400MB) Project Page Project Page [BibTex]

pdf publisher site poster supplemental material (400MB) Project Page Project Page [BibTex]


Lie Bodies: A Manifold Representation of {3D} Human Shape. Supplemental Material
Lie Bodies: A Manifold Representation of 3D Human Shape. Supplemental Material

Freifeld, O., Black, M. J.

(No. 5), Max Planck Institute for Intelligent Systems, October 2012 (techreport)

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Coupled Action Recognition and Pose Estimation from Multiple Views
Coupled Action Recognition and Pose Estimation from Multiple Views

Yao, A., Gall, J., van Gool, L.

International Journal of Computer Vision, 100(1):16-37, October 2012 (article)

publisher's site code pdf Project Page Project Page Project Page [BibTex]

publisher's site code pdf Project Page Project Page Project Page [BibTex]


MPI-Sintel Optical Flow Benchmark: Supplemental Material
MPI-Sintel Optical Flow Benchmark: Supplemental Material

Butler, D. J., Wulff, J., Stanley, G. B., Black, M. J.

(No. 6), Max Planck Institute for Intelligent Systems, October 2012 (techreport)

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Lessons and insights from creating a synthetic optical flow benchmark
Lessons and insights from creating a synthetic optical flow benchmark

Wulff, J., Butler, D. J., Stanley, G. B., Black, M. J.

In ECCV Workshop on Unsolved Problems in Optical Flow and Stereo Estimation, pages: 168-177, Part II, LNCS 7584, (Editors: A. Fusiello et al. (Eds.)), Springer-Verlag, October 2012 (inproceedings)

pdf dataset poster youtube Project Page [BibTex]

pdf dataset poster youtube Project Page [BibTex]


3D2PM {--} 3D Deformable Part Models
3D2PM – 3D Deformable Part Models

Pepik, B., Gehler, P., Stark, M., Schiele, B.

In Proceedings of the European Conference on Computer Vision (ECCV), pages: 356-370, Lecture Notes in Computer Science, (Editors: Fitzgibbon, Andrew W. and Lazebnik, Svetlana and Perona, Pietro and Sato, Yoichi and Schmid, Cordelia), Springer, Firenze, October 2012 (inproceedings)

pdf video poster Project Page [BibTex]

pdf video poster Project Page [BibTex]


A naturalistic open source movie for optical flow evaluation
A naturalistic open source movie for optical flow evaluation

Butler, D. J., Wulff, J., Stanley, G. B., Black, M. J.

In European Conf. on Computer Vision (ECCV), pages: 611-625, Part IV, LNCS 7577, (Editors: A. Fitzgibbon et al. (Eds.)), Springer-Verlag, October 2012 (inproceedings)

Abstract
Ground truth optical flow is difficult to measure in real scenes with natural motion. As a result, optical flow data sets are restricted in terms of size, complexity, and diversity, making optical flow algorithms difficult to train and test on realistic data. We introduce a new optical flow data set derived from the open source 3D animated short film Sintel. This data set has important features not present in the popular Middlebury flow evaluation: long sequences, large motions, specular reflections, motion blur, defocus blur, and atmospheric effects. Because the graphics data that generated the movie is open source, we are able to render scenes under conditions of varying complexity to evaluate where existing flow algorithms fail. We evaluate several recent optical flow algorithms and find that current highly-ranked methods on the Middlebury evaluation have difficulty with this more complex data set suggesting further research on optical flow estimation is needed. To validate the use of synthetic data, we compare the image- and flow-statistics of Sintel to those of real films and videos and show that they are similar. The data set, metrics, and evaluation website are publicly available.

pdf dataset youtube talk supplemental material Project Page Project Page [BibTex]

pdf dataset youtube talk supplemental material Project Page Project Page [BibTex]


{Characterization of 3-D Volumetric Probabilistic Scenes for Object Recognition}
Characterization of 3-D Volumetric Probabilistic Scenes for Object Recognition

Restrepo, M. I., Mayer, B. A., Ulusoy, A. O., Mundy, J. L.

In Selected Topics in Signal Processing, IEEE Journal of, 6(5):522-537, September 2012 (inproceedings)

Abstract
This paper presents a new volumetric representation for categorizing objects in large-scale 3-D scenes reconstructed from image sequences. This work uses a probabilistic volumetric model (PVM) that combines the ideas of background modeling and volumetric multi-view reconstruction to handle the uncertainty inherent in the problem of reconstructing 3-D structures from 2-D images. The advantages of probabilistic modeling have been demonstrated by recent application of the PVM representation to video image registration, change detection and classification of changes based on PVM context. The applications just mentioned, operate on 2-D projections of the PVM. This paper presents the first work to characterize and use the local 3-D information in the scenes. Two approaches to local feature description are proposed and compared: 1) features derived from a PCA analysis of model neighborhoods; and 2) features derived from the coefficients of a 3-D Taylor series expansion within each neighborhood. The resulting description is used in a bag-of-features approach to classify buildings, houses, cars, planes, and parking lots learned from aerial imagery collected over Providence, RI. It is shown that both feature descriptions explain the data with similar accuracy and their effectiveness for dense-feature categorization is compared for the different classes. Finally, 3-D extensions of the Harris corner detector and a Hessian-based detector are used to detect salient features. Both types of salient features are evaluated through object categorization experiments, where only features with maximal response are retained. For most saliency criteria tested, features based on the determinant of the Hessian achieved higher classification accuracy than Harris-based features.

pdf DOI [BibTex]

pdf DOI [BibTex]


A framework for relating neural activity to freely moving behavior
A framework for relating neural activity to freely moving behavior

Foster, J. D., Nuyujukian, P., Freifeld, O., Ryu, S., Black, M. J., Shenoy, K. V.

In 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC’12), pages: 2736 -2739 , IEEE, San Diego, August 2012 (inproceedings)

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Pottics {--} The Potts Topic Model for Semantic Image Segmentation
Pottics – The Potts Topic Model for Semantic Image Segmentation

Dann, C., Gehler, P., Roth, S., Nowozin, S.

In Proceedings of 34th DAGM Symposium, pages: 397-407, Lecture Notes in Computer Science, (Editors: Pinz, Axel and Pock, Thomas and Bischof, Horst and Leberl, Franz), Springer, August 2012 (inproceedings)

code pdf poster [BibTex]

code pdf poster [BibTex]


Psoriasis segmentation through chromatic regions and Geometric Active Contours
Psoriasis segmentation through chromatic regions and Geometric Active Contours

Bogo, F., Samory, M., Belloni Fortina, A., Piaserico, S., Peserico, E.

In 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC’12), pages: 5388-5391, San Diego, August 2012 (inproceedings)

pdf [BibTex]

pdf [BibTex]


PCA-enhanced stochastic optimization methods
PCA-enhanced stochastic optimization methods

Kuznetsova, A., Pons-Moll, G., Rosenhahn, B.

In German Conference on Pattern Recognition (GCPR), August 2012 (inproceedings)

pdf [BibTex]

pdf [BibTex]


Quasi-Newton Methods: A New Direction
Quasi-Newton Methods: A New Direction

Hennig, P., Kiefel, M.

In Proceedings of the 29th International Conference on Machine Learning, pages: 25-32, ICML ’12, (Editors: John Langford and Joelle Pineau), Omnipress, New York, NY, USA, July 2012 (inproceedings)

Abstract
Four decades after their invention, quasi- Newton methods are still state of the art in unconstrained numerical optimization. Although not usually interpreted thus, these are learning algorithms that fit a local quadratic approximation to the objective function. We show that many, including the most popular, quasi-Newton methods can be interpreted as approximations of Bayesian linear regression under varying prior assumptions. This new notion elucidates some shortcomings of classical algorithms, and lights the way to a novel nonparametric quasi-Newton method, which is able to make more efficient use of available information at computational cost similar to its predecessors.

website+code pdf link (url) [BibTex]

website+code pdf link (url) [BibTex]


{DRAPE: DRessing Any PErson}
DRAPE: DRessing Any PErson

Guan, P., Reiss, L., Hirshberg, D., Weiss, A., Black, M. J.

ACM Trans. on Graphics (Proc. SIGGRAPH), 31(4):35:1-35:10, July 2012 (article)

Abstract
We describe a complete system for animating realistic clothing on synthetic bodies of any shape and pose without manual intervention. The key component of the method is a model of clothing called DRAPE (DRessing Any PErson) that is learned from a physics-based simulation of clothing on bodies of different shapes and poses. The DRAPE model has the desirable property of "factoring" clothing deformations due to body shape from those due to pose variation. This factorization provides an approximation to the physical clothing deformation and greatly simplifies clothing synthesis. Given a parameterized model of the human body with known shape and pose parameters, we describe an algorithm that dresses the body with a garment that is customized to fit and possesses realistic wrinkles. DRAPE can be used to dress static bodies or animated sequences with a learned model of the cloth dynamics. Since the method is fully automated, it is appropriate for dressing large numbers of virtual characters of varying shape. The method is significantly more efficient than physical simulation.

YouTube pdf talk Project Page Project Page [BibTex]

YouTube pdf talk Project Page Project Page [BibTex]


Learning Search Based Inference for Object Detection
Learning Search Based Inference for Object Detection

Gehler, P., Lehmann, A.

In International Conference on Machine Learning (ICML) workshop on Inferning: Interactions between Inference and Learning, Edinburgh, Scotland, UK, July 2012, short version of BMVC11 paper (http://ps.is.tue.mpg.de/publications/31/get_file) (inproceedings)

pdf [BibTex]

pdf [BibTex]


Ghost Detection and Removal for High Dynamic Range Images: Recent Advances
Ghost Detection and Removal for High Dynamic Range Images: Recent Advances

Srikantha, A., Sidib’e, D.

Signal Processing: Image Communication, 27, pages: 650-662, July 2012 (article)

pdf link (url) [BibTex]

pdf link (url) [BibTex]


From Pixels to Layers: Joint Motion Estimation and Segmentation
From Pixels to Layers: Joint Motion Estimation and Segmentation

Sun, D.

Brown University, Department of Computer Science, July 2012 (phdthesis)

pdf [BibTex]

pdf [BibTex]


Distribution Fields for Tracking
Distribution Fields for Tracking

Sevilla-Lara, L., Learned-Miller, E.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, June 2012 (inproceedings)

Abstract
Visual tracking of general objects often relies on the assumption that gradient descent of the alignment function will reach the global optimum. A common technique to smooth the objective function is to blur the image. However, blurring the image destroys image information, which can cause the target to be lost. To address this problem we introduce a method for building an image descriptor using distribution fields (DFs), a representation that allows smoothing the objective function without destroying information about pixel values. We present experimental evidence on the superiority of the width of the basin of attraction around the global optimum of DFs over other descriptors. DFs also allow the representation of uncertainty about the tracked object. This helps in disregarding outliers during tracking (like occlusions or small misalignments) without modeling them explicitly. Finally, this provides a convenient way to aggregate the observations of the object through time and maintain an updated model. We present a simple tracking algorithm that uses DFs and obtains state-of-the-art results on standard benchmarks.

pdf Matlab code [BibTex]

pdf Matlab code [BibTex]


From pictorial structures to deformable structures
From pictorial structures to deformable structures

Zuffi, S., Freifeld, O., Black, M. J.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 3546-3553, IEEE, June 2012 (inproceedings)

Abstract
Pictorial Structures (PS) define a probabilistic model of 2D articulated objects in images. Typical PS models assume an object can be represented by a set of rigid parts connected with pairwise constraints that define the prior probability of part configurations. These models are widely used to represent non-rigid articulated objects such as humans and animals despite the fact that such objects have parts that deform non-rigidly. Here we define a new Deformable Structures (DS) model that is a natural extension of previous PS models and that captures the non-rigid shape deformation of the parts. Each part in a DS model is represented by a low-dimensional shape deformation space and pairwise potentials between parts capture how the shape varies with pose and the shape of neighboring parts. A key advantage of such a model is that it more accurately models object boundaries. This enables image likelihood models that are more discriminative than previous PS likelihoods. This likelihood is learned using training imagery annotated using a DS “puppet.” We focus on a human DS model learned from 2D projections of a realistic 3D human body model and use it to infer human poses in images using a form of non-parametric belief propagation.

pdf sup mat code poster Project Page Project Page Project Page Project Page [BibTex]

pdf sup mat code poster Project Page Project Page Project Page Project Page [BibTex]


Teaching 3D Geometry to Deformable Part Models
Teaching 3D Geometry to Deformable Part Models

Pepik, B., Stark, M., Gehler, P., Schiele, B.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 3362 -3369, IEEE, Providence, RI, USA, June 2012, oral presentation (inproceedings)

pdf DOI Project Page [BibTex]

pdf DOI Project Page [BibTex]


Visual Servoing on Unknown Objects
Visual Servoing on Unknown Objects

Gratal, X., Romero, J., Bohg, J., Kragic, D.

Mechatronics, 22(4):423-435, Elsevier, June 2012, Visual Servoing \{SI\} (article)

Abstract
We study visual servoing in a framework of detection and grasping of unknown objects. Classically, visual servoing has been used for applications where the object to be servoed on is known to the robot prior to the task execution. In addition, most of the methods concentrate on aligning the robot hand with the object without grasping it. In our work, visual servoing techniques are used as building blocks in a system capable of detecting and grasping unknown objects in natural scenes. We show how different visual servoing techniques facilitate a complete grasping cycle.

Grasping sequence video Offline calibration video Pdf DOI [BibTex]

Grasping sequence video Offline calibration video Pdf DOI [BibTex]


Branch-and-price global optimization for multi-view multi-object tracking
Branch-and-price global optimization for multi-view multi-object tracking

Leal-Taixé, L., Pons-Moll, G., Rosenhahn, B.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2012 (inproceedings)

project page paper poster [BibTex]

project page paper poster [BibTex]


A physically-based approach to reflection separation
A physically-based approach to reflection separation

Kong, N., Tai, Y., Shin, S. Y.

In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 9-16, June 2012 (inproceedings)

Abstract
We propose a physically-based approach to separate reflection using multiple polarized images with a background scene captured behind glass. The input consists of three polarized images, each captured from the same view point but with a different polarizer angle separated by 45 degrees. The output is the high-quality separation of the reflection and background layers from each of the input images. A main technical challenge for this problem is that the mixing coefficient for the reflection and background layers depends on the angle of incidence and the orientation of the plane of incidence, which are spatially-varying over the pixels of an image. Exploiting physical properties of polarization for a double-surfaced glass medium, we propose an algorithm which automatically finds the optimal separation of the reflection and background layers. Thorough experiments, we demonstrate that our approach can generate superior results to those of previous methods.

Publisher site [BibTex]

Publisher site [BibTex]


Visual Orientation and Directional Selectivity Through Thalamic Synchrony
Visual Orientation and Directional Selectivity Through Thalamic Synchrony

Stanley, G., Jin, J., Wang, Y., Desbordes, G., Wang, Q., Black, M., Alonso, J.

Journal of Neuroscience, 32(26):9073-9088, June 2012 (article)

Abstract
Thalamic neurons respond to visual scenes by generating synchronous spike trains on the timescale of 10–20 ms that are very effective at driving cortical targets. Here we demonstrate that this synchronous activity contains unexpectedly rich information about fundamental properties of visual stimuli. We report that the occurrence of synchronous firing of cat thalamic cells with highly overlapping receptive fields is strongly sensitive to the orientation and the direction of motion of the visual stimulus. We show that this stimulus selectivity is robust, remaining relatively unchanged under different contrasts and temporal frequencies (stimulus velocities). A computational analysis based on an integrate-and-fire model of the direct thalamic input to a layer 4 cortical cell reveals a strong correlation between the degree of thalamic synchrony and the nonlinear relationship between cortical membrane potential and the resultant firing rate. Together, these findings suggest a novel population code in the synchronous firing of neurons in the early visual pathway that could serve as the substrate for establishing cortical representations of the visual scene.

preprint publisher's site Project Page [BibTex]

preprint publisher's site Project Page [BibTex]


An Analysis of Successful Approaches to Human Pose Estimation
An Analysis of Successful Approaches to Human Pose Estimation

Lassner, C.

An Analysis of Successful Approaches to Human Pose Estimation, University of Augsburg, University of Augsburg, May 2012 (mastersthesis)

Abstract
The field of Human Pose Estimation is developing fast and lately leaped forward with the release of the Kinect system. That system reaches a very good perfor- mance for pose estimation using 3D scene information, however pose estimation from 2D color images is not solved reliably yet. There is a vast amount of pub- lications trying to reach this aim, but no compilation of important methods and solution strategies. The aim of this thesis is to fill this gap: it gives an introductory overview over important techniques by analyzing four current (2012) publications in detail. They are chosen such, that during their analysis many frequently used techniques for Human Pose Estimation can be explained. The thesis includes two introductory chapters with a definition of Human Pose Estimation and exploration of the main difficulties, as well as a detailed explanation of frequently used methods. A final chapter presents some ideas on how parts of the analyzed approaches can be recombined and shows some open questions that can be tackled in future work. The thesis is therefore a good entry point to the field of Human Pose Estimation and enables the reader to get an impression of the current state-of-the-art.

pdf [BibTex]

pdf [BibTex]


Bilinear Spatiotemporal Basis Models
Bilinear Spatiotemporal Basis Models

Akhter, I., Simon, T., Khan, S., Matthews, I., Sheikh, Y.

ACM Transactions on Graphics (TOG), 31(2):17, ACM, April 2012 (article)

Abstract
A variety of dynamic objects, such as faces, bodies, and cloth, are represented in computer graphics as a collection of moving spatial landmarks. Spatiotemporal data is inherent in a number of graphics applications including animation, simulation, and object and camera tracking. The principal modes of variation in the spatial geometry of objects are typically modeled using dimensionality reduction techniques, while concurrently, trajectory representations like splines and autoregressive models are widely used to exploit the temporal regularity of deformation. In this article, we present the bilinear spatiotemporal basis as a model that simultaneously exploits spatial and temporal regularity while maintaining the ability to generalize well to new sequences. This factorization allows the use of analytical, predefined functions to represent temporal variation (e.g., B-Splines or the Discrete Cosine Transform) resulting in efficient model representation and estimation. The model can be interpreted as representing the data as a linear combination of spatiotemporal sequences consisting of shape modes oscillating over time at key frequencies. We apply the bilinear model to natural spatiotemporal phenomena, including face, body, and cloth motion data, and compare it in terms of compaction, generalization ability, predictive precision, and efficiency to existing models. We demonstrate the application of the model to a number of graphics tasks including labeling, gap-filling, denoising, and motion touch-up.

pdf project page link (url) [BibTex]

pdf project page link (url) [BibTex]


Exploiting pedestrian interaction via global optimization and social behaviors
Exploiting pedestrian interaction via global optimization and social behaviors

Leal-Taixé, L., Pons-Moll, G., Rosenhahn, B.

In Theoretic Foundations of Computer Vision: Outdoor and Large-Scale Real-World Scene Analysis, Springer, April 2012 (incollection)

pdf [BibTex]

pdf [BibTex]


HUMIM Software for Articulated Tracking
HUMIM Software for Articulated Tracking

Soren Hauberg, Kim S. Pedersen

(01/2012), Department of Computer Science, University of Copenhagen, January 2012 (techreport)

Code PDF [BibTex]

Code PDF [BibTex]


A geometric framework for statistics on trees
A geometric framework for statistics on trees

Aasa Feragen, Mads Nielsen, Soren Hauberg, Pechin Lo, Marleen de Bruijne, Francois Lauze

(11/02), Department of Computer Science, University of Copenhagen, January 2012 (techreport)

PDF [BibTex]

PDF [BibTex]