Header logo is ps


2014


Thumb xl faust
FAUST: Dataset and evaluation for 3D mesh registration

(Dataset Award, Eurographics Symposium on Geometry Processing (SGP), 2016)

Bogo, F., Romero, J., Loper, M., Black, M. J.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 3794 -3801, Columbus, Ohio, USA, June 2014 (inproceedings)

Abstract
New scanning technologies are increasing the importance of 3D mesh data and the need for algorithms that can reliably align it. Surface registration is important for building full 3D models from partial scans, creating statistical shape models, shape retrieval, and tracking. The problem is particularly challenging for non-rigid and articulated objects like human bodies. While the challenges of real-world data registration are not present in existing synthetic datasets, establishing ground-truth correspondences for real 3D scans is difficult. We address this with a novel mesh registration technique that combines 3D shape and appearance information to produce high-quality alignments. We define a new dataset called FAUST that contains 300 scans of 10 people in a wide range of poses together with an evaluation methodology. To achieve accurate registration, we paint the subjects with high-frequency textures and use an extensive validation process to ensure accurate ground truth. We find that current shape registration methods have trouble with this real-world data. The dataset and evaluation website are available for research purposes at http://faust.is.tue.mpg.de.

pdf Video Dataset Poster Talk DOI Project Page Project Page Project Page [BibTex]

2014

pdf Video Dataset Poster Talk DOI Project Page Project Page Project Page [BibTex]


Thumb xl modeltransport
Model Transport: Towards Scalable Transfer Learning on Manifolds

Freifeld, O., Hauberg, S., Black, M. J.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 1378 -1385, Columbus, Ohio, USA, June 2014 (inproceedings)

Abstract
We consider the intersection of two research fields: transfer learning and statistics on manifolds. In particular, we consider, for manifold-valued data, transfer learning of tangent-space models such as Gaussians distributions, PCA, regression, or classifiers. Though one would hope to simply use ordinary Rn-transfer learning ideas, the manifold structure prevents it. We overcome this by basing our method on inner-product-preserving parallel transport, a well-known tool widely used in other problems of statistics on manifolds in computer vision. At first, this straightforward idea seems to suffer from an obvious shortcoming: Transporting large datasets is prohibitively expensive, hindering scalability. Fortunately, with our approach, we never transport data. Rather, we show how the statistical models themselves can be transported, and prove that for the tangent-space models above, the transport “commutes” with learning. Consequently, our compact framework, applicable to a large class of manifolds, is not restricted by the size of either the training or test sets. We demonstrate the approach by transferring PCA and logistic-regression models of real-world data involving 3D shapes and image descriptors.

pdf SupMat Video poster DOI Project Page [BibTex]

pdf SupMat Video poster DOI Project Page [BibTex]


Thumb xl screen shot 2014 07 09 at 15.49.27
Robot Arm Pose Estimation through Pixel-Wise Part Classification

Bohg, J., Romero, J., Herzog, A., Schaal, S.

In IEEE International Conference on Robotics and Automation (ICRA) 2014, pages: 3143-3150, June 2014 (inproceedings)

Abstract
We propose to frame the problem of marker-less robot arm pose estimation as a pixel-wise part classification problem. As input, we use a depth image in which each pixel is classified to be either from a particular robot part or the background. The classifier is a random decision forest trained on a large number of synthetically generated and labeled depth images. From all the training samples ending up at a leaf node, a set of offsets is learned that votes for relative joint positions. Pooling these votes over all foreground pixels and subsequent clustering gives us an estimate of the true joint positions. Due to the intrinsic parallelism of pixel-wise classification, this approach can run in super real-time and is more efficient than previous ICP-like methods. We quantitatively evaluate the accuracy of this approach on synthetic data. We also demonstrate that the method produces accurate joint estimates on real data despite being purely trained on synthetic data.

video code pdf DOI Project Page [BibTex]

video code pdf DOI Project Page [BibTex]


Thumb xl dfm
Efficient Non-linear Markov Models for Human Motion

Lehrmann, A. M., Gehler, P. V., Nowozin, S.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 1314-1321, IEEE, June 2014 (inproceedings)

Abstract
Dynamic Bayesian networks such as Hidden Markov Models (HMMs) are successfully used as probabilistic models for human motion. The use of hidden variables makes them expressive models, but inference is only approximate and requires procedures such as particle filters or Markov chain Monte Carlo methods. In this work we propose to instead use simple Markov models that only model observed quantities. We retain a highly expressive dynamic model by using interactions that are nonlinear and non-parametric. A presentation of our approach in terms of latent variables shows logarithmic growth for the computation of exact loglikelihoods in the number of latent states. We validate our model on human motion capture data and demonstrate state-of-the-art performance on action recognition and motion completion tasks.

Project page pdf DOI Project Page [BibTex]

Project page pdf DOI Project Page [BibTex]


Thumb xl grassmann
Grassmann Averages for Scalable Robust PCA

Hauberg, S., Feragen, A., Black, M. J.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 3810 -3817, Columbus, Ohio, USA, June 2014 (inproceedings)

Abstract
As the collection of large datasets becomes increasingly automated, the occurrence of outliers will increase – "big data" implies "big outliers". While principal component analysis (PCA) is often used to reduce the size of data, and scalable solutions exist, it is well-known that outliers can arbitrarily corrupt the results. Unfortunately, state-of-the-art approaches for robust PCA do not scale beyond small-to-medium sized datasets. To address this, we introduce the Grassmann Average (GA), which expresses dimensionality reduction as an average of the subspaces spanned by the data. Because averages can be efficiently computed, we immediately gain scalability. GA is inherently more robust than PCA, but we show that they coincide for Gaussian data. We exploit that averages can be made robust to formulate the Robust Grassmann Average (RGA) as a form of robust PCA. Robustness can be with respect to vectors (subspaces) or elements of vectors; we focus on the latter and use a trimmed average. The resulting Trimmed Grassmann Average (TGA) is particularly appropriate for computer vision because it is robust to pixel outliers. The algorithm has low computational complexity and minimal memory requirements, making it scalable to "big noisy data." We demonstrate TGA for background modeling, video restoration, and shadow removal. We show scalability by performing robust PCA on the entire Star Wars IV movie.

pdf code supplementary material tutorial video results video talk poster DOI Project Page [BibTex]

pdf code supplementary material tutorial video results video talk poster DOI Project Page [BibTex]


Thumb xl 3basic posebits
Posebits for Monocular Human Pose Estimation

Pons-Moll, G., Fleet, D. J., Rosenhahn, B.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 2345-2352, Columbus, Ohio, USA, June 2014 (inproceedings)

Abstract
We advocate the inference of qualitative information about 3D human pose, called posebits, from images. Posebits represent boolean geometric relationships between body parts (e.g., left-leg in front of right-leg or hands close to each other). The advantages of posebits as a mid-level representation are 1) for many tasks of interest, such qualitative pose information may be sufficient (e.g. , semantic image retrieval), 2) it is relatively easy to annotate large image corpora with posebits, as it simply requires answers to yes/no questions; and 3) they help resolve challenging pose ambiguities and therefore facilitate the difficult talk of image-based 3D pose estimation. We introduce posebits, a posebit database, a method for selecting useful posebits for pose estimation and a structural SVM model for posebit inference. Experiments show the use of posebits for semantic image retrieval and for improving 3D pose estimation.

pdf Project Page Project Page [BibTex]

pdf Project Page Project Page [BibTex]


Thumb xl roser
Simultaneous Underwater Visibility Assessment, Enhancement and Improved Stereo

Roser, M., Dunbabin, M., Geiger, A.

IEEE International Conference on Robotics and Automation, pages: 3840 - 3847 , Hong Kong, China, June 2014 (conference)

Abstract
Vision-based underwater navigation and obstacle avoidance demands robust computer vision algorithms, particularly for operation in turbid water with reduced visibility. This paper describes a novel method for the simultaneous underwater image quality assessment, visibility enhancement and disparity computation to increase stereo range resolution under dynamic, natural lighting and turbid conditions. The technique estimates the visibility properties from a sparse 3D map of the original degraded image using a physical underwater light attenuation model. Firstly, an iterated distance-adaptive image contrast enhancement enables a dense disparity computation and visibility estimation. Secondly, using a light attenuation model for ocean water, a color corrected stereo underwater image is obtained along with a visibility distance estimate. Experimental results in shallow, naturally lit, high-turbidity coastal environments show the proposed technique improves range estimation over the original images as well as image quality and color for habitat classification. Furthermore, the recursiveness and robustness of the technique allows real-time implementation onboard an Autonomous Underwater Vehicles for improved navigation and obstacle avoidance performance.

pdf DOI [BibTex]

pdf DOI [BibTex]


Thumb xl icmlteaser
Preserving Modes and Messages via Diverse Particle Selection

Pacheco, J., Zuffi, S., Black, M. J., Sudderth, E.

In Proceedings of the 31st International Conference on Machine Learning (ICML-14), 32(1):1152-1160, J. Machine Learning Research Workshop and Conf. and Proc., Beijing, China, June 2014 (inproceedings)

Abstract
In applications of graphical models arising in domains such as computer vision and signal processing, we often seek the most likely configurations of high-dimensional, continuous variables. We develop a particle-based max-product algorithm which maintains a diverse set of posterior mode hypotheses, and is robust to initialization. At each iteration, the set of hypotheses at each node is augmented via stochastic proposals, and then reduced via an efficient selection algorithm. The integer program underlying our optimization-based particle selection minimizes errors in subsequent max-product message updates. This objective automatically encourages diversity in the maintained hypotheses, without requiring tuning of application-specific distances among hypotheses. By avoiding the stochastic resampling steps underlying particle sum-product algorithms, we also avoid common degeneracies where particles collapse onto a single hypothesis. Our approach significantly outperforms previous particle-based algorithms in experiments focusing on the estimation of human pose from single images.

pdf SupMat link (url) Project Page Project Page [BibTex]

pdf SupMat link (url) Project Page Project Page [BibTex]


Thumb xl schoenbein
Calibrating and Centering Quasi-Central Catadioptric Cameras

Schoenbein, M., Strauss, T., Geiger, A.

IEEE International Conference on Robotics and Automation, pages: 4443 - 4450, Hong Kong, China, June 2014 (conference)

Abstract
Non-central catadioptric models are able to cope with irregular camera setups and inaccuracies in the manufacturing process but are computationally demanding and thus not suitable for robotic applications. On the other hand, calibrating a quasi-central (almost central) system with a central model introduces errors due to a wrong relationship between the viewing ray orientations and the pixels on the image sensor. In this paper, we propose a central approximation to quasi-central catadioptric camera systems that is both accurate and efficient. We observe that the distance to points in 3D is typically large compared to deviations from the single viewpoint. Thus, we first calibrate the system using a state-of-the-art non-central camera model. Next, we show that by remapping the observations we are able to match the orientation of the viewing rays of a much simpler single viewpoint model with the true ray orientations. While our approximation is general and applicable to all quasi-central camera systems, we focus on one of the most common cases in practice: hypercatadioptric cameras. We compare our model to a variety of baselines in synthetic and real localization and motion estimation experiments. We show that by using the proposed model we are able to achieve near non-central accuracy while obtaining speed-ups of more than three orders of magnitude compared to state-of-the-art non-central models.

pdf DOI [BibTex]

pdf DOI [BibTex]


Thumb xl pami
3D Traffic Scene Understanding from Movable Platforms

Geiger, A., Lauer, M., Wojek, C., Stiller, C., Urtasun, R.

IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 36(5):1012-1025, published, IEEE, Los Alamitos, CA, May 2014 (article)

Abstract
In this paper, we present a novel probabilistic generative model for multi-object traffic scene understanding from movable platforms which reasons jointly about the 3D scene layout as well as the location and orientation of objects in the scene. In particular, the scene topology, geometry and traffic activities are inferred from short video sequences. Inspired by the impressive driving capabilities of humans, our model does not rely on GPS, lidar or map knowledge. Instead, it takes advantage of a diverse set of visual cues in the form of vehicle tracklets, vanishing points, semantic scene labels, scene flow and occupancy grids. For each of these cues we propose likelihood functions that are integrated into a probabilistic generative model. We learn all model parameters from training data using contrastive divergence. Experiments conducted on videos of 113 representative intersections show that our approach successfully infers the correct layout in a variety of very challenging scenarios. To evaluate the importance of each feature cue, experiments using different feature combinations are conducted. Furthermore, we show how by employing context derived from the proposed method we are able to improve over the state-of-the-art in terms of object detection and object orientation estimation in challenging and cluttered urban environments.

pdf link (url) [BibTex]

pdf link (url) [BibTex]


Thumb xl blueman cropped2
Modeling the Human Body in 3D: Data Registration and Human Shape Representation

Tsoli, A.

Brown University, Department of Computer Science, May 2014 (phdthesis)

pdf [BibTex]

pdf [BibTex]


Thumb xl modeltransport
Model transport: towards scalable transfer learning on manifolds - supplemental material

Freifeld, O., Hauberg, S., Black, M. J.

(9), April 2014 (techreport)

Abstract
This technical report is complementary to "Model Transport: Towards Scalable Transfer Learning on Manifolds" and contains proofs, explanation of the attached video (visualization of bases from the body shape experiments), and high-resolution images of select results of individual reconstructions from the shape experiments. It is identical to the supplemental mate- rial submitted to the Conference on Computer Vision and Pattern Recognition (CVPR 2014) on November 2013.

PDF [BibTex]


Thumb xl aistats2014
Probabilistic Solutions to Differential Equations and their Application to Riemannian Statistics

Hennig, P., Hauberg, S.

In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, 33, pages: 347-355, JMLR: Workshop and Conference Proceedings, (Editors: S Kaski and J Corander), Microtome Publishing, Brookline, MA, April 2014 (inproceedings)

Abstract
We study a probabilistic numerical method for the solution of both boundary and initial value problems that returns a joint Gaussian process posterior over the solution. Such methods have concrete value in the statistics on Riemannian manifolds, where non-analytic ordinary differential equations are involved in virtually all computations. The probabilistic formulation permits marginalising the uncertainty of the numerical solution such that statistics are less sensitive to inaccuracies. This leads to new Riemannian algorithms for mean value computations and principal geodesic analysis. Marginalisation also means results can be less precise than point estimates, enabling a noticeable speed-up over the state of the art. Our approach is an argument for a wider point that uncertainty caused by numerical calculations should be tracked throughout the pipeline of machine learning algorithms.

pdf Youtube Supplements Project page link (url) [BibTex]

pdf Youtube Supplements Project page link (url) [BibTex]


Thumb xl thumb
Multi-View Priors for Learning Detectors from Sparse Viewpoint Data

Pepik, B., Stark, M., Gehler, P., Schiele, B.

International Conference on Learning Representations, April 2014 (conference)

Abstract
While the majority of today's object class models provide only 2D bounding boxes, far richer output hypotheses are desirable including viewpoint, fine-grained category, and 3D geometry estimate. However, models trained to provide richer output require larger amounts of training data, preferably well covering the relevant aspects such as viewpoint and fine-grained categories. In this paper, we address this issue from the perspective of transfer learning, and design an object class model that explicitly leverages correlations between visual features. Specifically, our model represents prior distributions over permissible multi-view detectors in a parametric way -- the priors are learned once from training data of a source object class, and can later be used to facilitate the learning of a detector for a target class. As we show in our experiments, this transfer is not only beneficial for detectors based on basic-level category representations, but also enables the robust learning of detectors that represent classes at finer levels of granularity, where training data is typically even scarcer and more unbalanced. As a result, we report largely improved performance in simultaneous 2D object localization and viewpoint estimation on a recent dataset of challenging street scenes.

reviews pdf Project Page [BibTex]

reviews pdf Project Page [BibTex]


no image
RoCKIn@Work in a Nutshell

Ahmad, A., Amigoni, A., Awaad, I., Berghofer, J., Bischoff, R., Bonarini, A., Dwiputra, R., Fontana, G., Hegger, F., Hochgeschwender, N., Iocchi, L., Kraetzschmar, G., Lima, P., Matteucci, M., Nardi, D., Schiaffonati, V., Schneider, S.

(FP7-ICT-601012 Revision 1.2), RoCKIn - Robot Competitions Kick Innovation in Cognitive Systems and Robotics, March 2014 (techreport)

Abstract
The main purpose of RoCKIn@Work is to foster innovation in industrial service robotics. Innovative robot applications for industry call for the capability to work interactively with humans and reduced initial programming requirements. This will open new opportunities to automate challenging manufacturing processes, even for small to medium-sized lots and highly customer-specific production requirements. Thereby, the RoCKIn competitions pave the way for technology transfer and contribute to the continued commercial competitiveness of European industry.

[BibTex]

[BibTex]


no image
RoCKIn@Home in a Nutshell

Ahmad, A., Amigoni, F., Awaad, I., Berghofer, J., Bischoff, R., Bonarini, A., Dwiputra, R., Fontana, G., Hegger, F., Hochgeschwender, N., Iocchi, L., Kraetzschmar, G., Lima, P., Matteucci, M., Nardi, D., Schneider, S.

(FP7-ICT-601012 Revision 0.8), RoCKIn - Robot Competitions Kick Innovation in Cognitive Systems and Robotics, March 2014 (techreport)

Abstract
RoCKIn@Home is a competition that aims at bringing together the benefits of scientific benchmarking with the attraction of scientific competitions in the realm of domestic service robotics. The objectives are to bolster research in service robotics for home applications and to raise public awareness of the current and future capabilities of such robot systems to meet societal challenges like healthy ageing and longer independent living.

[BibTex]

[BibTex]


Thumb xl figure1
NRSfM using Local Rigidity

Rehan, A., Zaheer, A., Akhter, I., Saeed, A., Mahmood, B., Usmani, M., Khan, S.

In Proceedings Winter Conference on Applications of Computer Vision, pages: 69-74, open access, IEEE , Steamboat Springs, CO, USA, March 2014 (inproceedings)

Abstract
Factorization methods for computation of nonrigid structure have limited practicality, and work well only when there is large enough camera motion between frames, with long sequences and limited or no occlusions. We show that typical nonrigid structure can often be approximated well as locally rigid sub-structures in time and space. Specifically, we assume that: 1) the structure can be approximated as rigid in a short local time window and 2) some point pairs stay relatively rigid in space, maintaining a fixed distance between them during the sequence. We first use the triangulation constraints in rigid SFM over a sliding time window to get an initial estimate of the nonrigid 3D structure. We then automatically identify relatively rigid point pairs in this structure, and use their length-constancy simultaneously with triangulation constraints to refine the structure estimate. Unlike factorization methods, the structure is estimated independent of the camera motion computation, adding to the simplicity and stability of the approach. Further, local factorization inherently handles significant natural occlusions gracefully, performing much better than the state-of-the art. We show more stable and accurate results as compared to the state-of-the art on even short sequences starting from 15 frames only, containing camera rotations as small as 2 degree and up to 50% missing data.

link (url) [BibTex]

link (url) [BibTex]


Thumb xl homerjournal
Adaptive Offset Correction for Intracortical Brain Computer Interfaces

Homer, M. L., Perge, J. A., Black, M. J., Harrison, M. T., Cash, S. S., Hochberg, L. R.

IEEE Transactions on Neural Systems and Rehabilitation Engineering, 22(2):239-248, March 2014 (article)

Abstract
Intracortical brain computer interfaces (iBCIs) decode intended movement from neural activity for the control of external devices such as a robotic arm. Standard approaches include a calibration phase to estimate decoding parameters. During iBCI operation, the statistical properties of the neural activity can depart from those observed during calibration, sometimes hindering a user’s ability to control the iBCI. To address this problem, we adaptively correct the offset terms within a Kalman filter decoder via penalized maximum likelihood estimation. The approach can handle rapid shifts in neural signal behavior (on the order of seconds) and requires no knowledge of the intended movement. The algorithm, called MOCA, was tested using simulated neural activity and evaluated retrospectively using data collected from two people with tetraplegia operating an iBCI. In 19 clinical research test cases, where a nonadaptive Kalman filter yielded relatively high decoding errors, MOCA significantly reduced these errors (10.6 ± 10.1\%; p < 0.05, pairwise t-test). MOCA did not significantly change the error in the remaining 23 cases where a nonadaptive Kalman filter already performed well. These results suggest that MOCA provides more robust decoding than the standard Kalman filter for iBCIs.

pdf DOI Project Page [BibTex]

pdf DOI Project Page [BibTex]


Thumb xl aggteaser
Model-based Anthropometry: Predicting Measurements from 3D Human Scans in Multiple Poses

Tsoli, A., Loper, M., Black, M. J.

In Proceedings Winter Conference on Applications of Computer Vision, pages: 83-90, IEEE , March 2014 (inproceedings)

Abstract
Extracting anthropometric or tailoring measurements from 3D human body scans is important for applications such as virtual try-on, custom clothing, and online sizing. Existing commercial solutions identify anatomical landmarks on high-resolution 3D scans and then compute distances or circumferences on the scan. Landmark detection is sensitive to acquisition noise (e.g. holes) and these methods require subjects to adopt a specific pose. In contrast, we propose a solution we call model-based anthropometry. We fit a deformable 3D body model to scan data in one or more poses; this model-based fitting is robust to scan noise. This brings the scan into registration with a database of registered body scans. Then, we extract features from the registered model (rather than from the scan); these include, limb lengths, circumferences, and statistical features of global shape. Finally, we learn a mapping from these features to measurements using regularized linear regression. We perform an extensive evaluation using the CAESAR dataset and demonstrate that the accuracy of our method outperforms state-of-the-art methods.

pdf DOI Project Page Project Page [BibTex]

pdf DOI Project Page Project Page [BibTex]


Thumb xl tpami small
A physically-based approach to reflection separation: from physical modeling to constrained optimization

Kong, N., Tai, Y., Shin, J. S.

IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 36(2):209-221, IEEE Computer Society, Febuary 2014 (article)

Abstract
We propose a physically-based approach to separate reflection using multiple polarized images with a background scene captured behind glass. The input consists of three polarized images, each captured from the same view point but with a different polarizer angle separated by 45 degrees. The output is the high-quality separation of the reflection and background layers from each of the input images. A main technical challenge for this problem is that the mixing coefficient for the reflection and background layers depends on the angle of incidence and the orientation of the plane of incidence, which are spatially varying over the pixels of an image. Exploiting physical properties of polarization for a double-surfaced glass medium, we propose a multiscale scheme which automatically finds the optimal separation of the reflection and background layers. Through experiments, we demonstrate that our approach can generate superior results to those of previous methods.

Publisher site [BibTex]

Publisher site [BibTex]


Thumb xl tbme
Simpler, faster, more accurate melanocytic lesion segmentation through MEDS

Peruch, F., Bogo, F., Bonazza, M., Cappelleri, V., Peserico, E.

IEEE Transactions on Biomedical Engineering, 61(2):557-565, February 2014 (article)

DOI [BibTex]

DOI [BibTex]


Thumb xl iccv2013 siyu 1
Learning People Detectors for Tracking in Crowded Scenes.

Tang, S., Andriluka, M., Milan, A., Schindler, K., Roth, S., Schiele, B.

2014, Scene Understanding Workshop (SUNw, CVPR workshop) (unpublished)

[BibTex]

[BibTex]


Thumb xl isprs2014
Evaluation of feature-based 3-d registration of probabilistic volumetric scenes

Restrepo, M. I., Ulusoy, A. O., Mundy, J. L.

In ISPRS Journal of Photogrammetry and Remote Sensing, 98(0):1-18, 2014 (inproceedings)

Abstract
Automatic estimation of the world surfaces from aerial images has seen much attention and progress in recent years. Among current modeling technologies, probabilistic volumetric models (PVMs) have evolved as an alternative representation that can learn geometry and appearance in a dense and probabilistic manner. Recent progress, in terms of storage and speed, achieved in the area of volumetric modeling, opens the opportunity to develop new frameworks that make use of the {PVM} to pursue the ultimate goal of creating an entire map of the earth, where one can reason about the semantics and dynamics of the 3-d world. Aligning 3-d models collected at different time-instances constitutes an important step for successful fusion of large spatio-temporal information. This paper evaluates how effectively probabilistic volumetric models can be aligned using robust feature-matching techniques, while considering different scenarios that reflect the kind of variability observed across aerial video collections from different time instances. More precisely, this work investigates variability in terms of discretization, resolution and sampling density, errors in the camera orientation, and changes in illumination and geographic characteristics. All results are given for large-scale, outdoor sites. In order to facilitate the comparison of the registration performance of {PVMs} to that of other 3-d reconstruction techniques, the registration pipeline is also carried out using Patch-based Multi-View Stereo (PMVS) algorithm. Registration performance is similar for scenes that have favorable geometry and the appearance characteristics necessary for high quality reconstruction. In scenes containing trees, such as a park, or many buildings, such as a city center, registration performance is significantly more accurate when using the PVM.

Publisher site link (url) DOI [BibTex]

Publisher site link (url) DOI [BibTex]


Thumb xl freelymoving2
A freely-moving monkey treadmill model

Foster, J., Nuyujukian, P., Freifeld, O., Gao, H., Walker, R., Ryu, S., Meng, T., Murmann, B., Black, M., Shenoy, K.

J. of Neural Engineering, 11(4):046020, 2014 (article)

Abstract
Objective: Motor neuroscience and brain-machine interface (BMI) design is based on examining how the brain controls voluntary movement, typically by recording neural activity and behavior from animal models. Recording technologies used with these animal models have traditionally limited the range of behaviors that can be studied, and thus the generality of science and engineering research. We aim to design a freely-moving animal model using neural and behavioral recording technologies that do not constrain movement. Approach: We have established a freely-moving rhesus monkey model employing technology that transmits neural activity from an intracortical array using a head-mounted device and records behavior through computer vision using markerless motion capture. We demonstrate the excitability and utility of this new monkey model, including the fi rst recordings from motor cortex while rhesus monkeys walk quadrupedally on a treadmill. Main results: Using this monkey model, we show that multi-unit threshold-crossing neural activity encodes the phase of walking and that the average ring rate of the threshold crossings covaries with the speed of individual steps. On a population level, we find that neural state-space trajectories of walking at diff erent speeds have similar rotational dynamics in some dimensions that evolve at the step rate of walking, yet robustly separate by speed in other state-space dimensions. Significance: Freely-moving animal models may allow neuroscientists to examine a wider range of behaviors and can provide a flexible experimental paradigm for examining the neural mechanisms that underlie movement generation across behaviors and environments. For BMIs, freely-moving animal models have the potential to aid prosthetic design by examining how neural encoding changes with posture, environment, and other real-world context changes. Understanding this new realm of behavior in more naturalistic settings is essential for overall progress of basic motor neuroscience and for the successful translation of BMIs to people with paralysis.

pdf Supplementary DOI Project Page [BibTex]

pdf Supplementary DOI Project Page [BibTex]


Thumb xl dissertation teaser scaled
Human Pose Estimation from Video and Inertial Sensors

Pons-Moll, G.

Ph.D Thesis, -, 2014 (book)

Abstract
The analysis and understanding of human movement is central to many applications such as sports science, medical diagnosis and movie production. The ability to automatically monitor human activity in security sensitive areas such as airports, lobbies or borders is of great practical importance. Furthermore, automatic pose estimation from images leverages the processing and understanding of massive digital libraries available on the Internet. We build upon a model based approach where the human shape is modelled with a surface mesh and the motion is parametrized by a kinematic chain. We then seek for the pose of the model that best explains the available observations coming from different sensors. In a first scenario, we consider a calibrated mult-iview setup in an indoor studio. To obtain very accurate results, we propose a novel tracker that combines information coming from video and a small set of Inertial Measurement Units (IMUs). We do so by locally optimizing a joint energy consisting of a term that measures the likelihood of the video data and a term for the IMU data. This is the first work to successfully combine video and IMUs information for full body pose estimation. When compared to commercial marker based systems the proposed solution is more cost efficient and less intrusive for the user. In a second scenario, we relax the assumption of an indoor studio and we tackle outdoor scenes with background clutter, illumination changes, large recording volumes and difficult motions of people interacting with objects. Again, we combine information from video and IMUs. Here we employ a particle based optimization approach that allows us to be more robust to tracking failures. To satisfy the orientation constraints imposed by the IMUs, we derive an analytic Inverse Kinematics (IK) procedure to sample from the manifold of valid poses. The generated hypothesis come from a lower dimensional manifold and therefore the computational cost can be reduced. Experiments on challenging sequences suggest the proposed tracker can be applied to capture in outdoor scenarios. Furthermore, the proposed IK sampling procedure can be used to integrate any kind of constraints derived from the environment. Finally, we consider the most challenging possible scenario: pose estimation of monocular images. Here, we argue that estimating the pose to the degree of accuracy as in an engineered environment is too ambitious with the current technology. Therefore, we propose to extract meaningful semantic information about the pose directly from image features in a discriminative fashion. In particular, we introduce posebits which are semantic pose descriptors about the geometric relationships between parts in the body. The experiments show that the intermediate step of inferring posebits from images can improve pose estimation from monocular imagery. Furthermore, posebits can be very useful as input feature for many computer vision algorithms.

pdf [BibTex]


no image
Left Ventricle Segmentation by Dynamic Shape Constrained Random Walk

X. Yang, Y. Su, M. Wan, S. Y. Yeo, C. Lim, S. T. Wong, L. Zhong, R. S. Tan

In Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2014 (inproceedings)

Abstract
Accurate and robust extraction of the left ventricle (LV) cavity is a key step for quantitative analysis of cardiac functions. In this study, we propose an improved LV cavity segmentation method that incorporates a dynamic shape constraint into the weighting function of the random walks algorithm. The method involves an iterative process that updates an intermediate result to the desired solution. The shape constraint restricts the solution space of the segmentation result, such that the robustness of the algorithm is increased to handle misleading information that emanates from noise, weak boundaries, and clutter. Our experiments on real cardiac magnetic resonance images demonstrate that the proposed method obtains better segmentation performance than standard method.

[BibTex]

[BibTex]


Thumb xl tang14ijcv
Detection and Tracking of Occluded People

Tang, S., Andriluka, M., Schiele, B.

International Journal of Computer Vision, 110, pages: 58-69, 2014 (article)

PDF [BibTex]

PDF [BibTex]


Thumb xl jnb1
Segmentation of Biomedical Images Using Active Contour Model with Robust Image Feature and Shape Prior

S. Y. Yeo, X. Xie, I. Sazonov, P. Nithiarasu

International Journal for Numerical Methods in Biomedical Engineering, 30(2):232- 248, 2014 (article)

Abstract
In this article, a new level set model is proposed for the segmentation of biomedical images. The image energy of the proposed model is derived from a robust image gradient feature which gives the active contour a global representation of the geometric configuration, making it more robust in dealing with image noise, weak edges, and initial configurations. Statistical shape information is incorporated using nonparametric shape density distribution, which allows the shape model to handle relatively large shape variations. The segmentation of various shapes from both synthetic and real images depict the robustness and efficiency of the proposed method.

[BibTex]

[BibTex]


Thumb xl simulated annealing
Simulated Annealing

Gall, J.

In Encyclopedia of Computer Vision, pages: 737-741, 0, (Editors: Ikeuchi, K. ), Springer Verlag, 2014, to appear (inbook)

[BibTex]

[BibTex]


Thumb xl ijcvflow2
A Quantitative Analysis of Current Practices in Optical Flow Estimation and the Principles behind Them

Sun, D., Roth, S., Black, M. J.

International Journal of Computer Vision (IJCV), 106(2):115-137, 2014 (article)

Abstract
The accuracy of optical flow estimation algorithms has been improving steadily as evidenced by results on the Middlebury optical flow benchmark. The typical formulation, however, has changed little since the work of Horn and Schunck. We attempt to uncover what has made recent advances possible through a thorough analysis of how the objective function, the optimization method, and modern implementation practices influence accuracy. We discover that "classical'' flow formulations perform surprisingly well when combined with modern optimization and implementation techniques. One key implementation detail is the median filtering of intermediate flow fields during optimization. While this improves the robustness of classical methods it actually leads to higher energy solutions, meaning that these methods are not optimizing the original objective function. To understand the principles behind this phenomenon, we derive a new objective function that formalizes the median filtering heuristic. This objective function includes a non-local smoothness term that robustly integrates flow estimates over large spatial neighborhoods. By modifying this new term to include information about flow and image boundaries we develop a method that can better preserve motion details. To take advantage of the trend towards video in wide-screen format, we further introduce an asymmetric pyramid downsampling scheme that enables the estimation of longer range horizontal motions. The methods are evaluated on Middlebury, MPI Sintel, and KITTI datasets using the same parameter settings.

pdf full text code [BibTex]


Thumb xl glsn1
Automatic 4D Reconstruction of Patient-Specific Cardiac Mesh with 1- to-1 Vertex Correspondence from Segmented Contours Lines

C. W. Lim, Y. Su, S. Y. Yeo, G. M. Ng, V. T. Nguyen, L. Zhong, R. S. Tan, K. K. Poh, P. Chai,

PLOS ONE, 9(4), 2014 (article)

Abstract
We propose an automatic algorithm for the reconstruction of patient-specific cardiac mesh models with 1-to-1 vertex correspondence. In this framework, a series of 3D meshes depicting the endocardial surface of the heart at each time step is constructed, based on a set of border delineated magnetic resonance imaging (MRI) data of the whole cardiac cycle. The key contribution in this work involves a novel reconstruction technique to generate a 4D (i.e., spatial–temporal) model of the heart with 1-to-1 vertex mapping throughout the time frames. The reconstructed 3D model from the first time step is used as a base template model and then deformed to fit the segmented contours from the subsequent time steps. A method to determine a tree-based connectivity relationship is proposed to ensure robust mapping during mesh deformation. The novel feature is the ability to handle intra- and inter-frame 2D topology changes of the contours, which manifests as a series of merging and splitting of contours when the images are viewed either in a spatial or temporal sequence. Our algorithm has been tested on five acquisitions of cardiac MRI and can successfully reconstruct the full 4D heart model in around 30 minutes per subject. The generated 4D heart model conforms very well with the input segmented contours and the mesh element shape is of reasonably good quality. The work is important in the support of downstream computational simulation activities.

[BibTex]

[BibTex]

2013


Thumb xl iccv2013 siyu
Learning People Detectors for Tracking in Crowded Scenes

Tang, S., Andriluka, M., Milan, A., Schindler, K., Roth, S., Schiele, B.

In 2013 IEEE International Conference on Computer Vision, pages: 1049-1056, IEEE, December 2013 (inproceedings)

PDF DOI [BibTex]

2013

PDF DOI [BibTex]


Thumb xl thumb
Branch&Rank for Efficient Object Detection

Lehmann, A., Gehler, P., VanGool, L.

International Journal of Computer Vision, Springer, December 2013 (article)

Abstract
Ranking hypothesis sets is a powerful concept for efficient object detection. In this work, we propose a branch&rank scheme that detects objects with often less than 100 ranking operations. This efficiency enables the use of strong and also costly classifiers like non-linear SVMs with RBF-TeX kernels. We thereby relieve an inherent limitation of branch&bound methods as bounds are often not tight enough to be effective in practice. Our approach features three key components: a ranking function that operates on sets of hypotheses and a grouping of these into different tasks. Detection efficiency results from adaptively sub-dividing the object search space into decreasingly smaller sets. This is inherited from branch&bound, while the ranking function supersedes a tight bound which is often unavailable (except for rather limited function classes). The grouping makes the system effective: it separates image classification from object recognition, yet combines them in a single formulation, phrased as a structured SVM problem. A novel aspect of branch&rank is that a better ranking function is expected to decrease the number of classifier calls during detection. We use the VOC’07 dataset to demonstrate the algorithmic properties of branch&rank.

pdf link (url) [BibTex]

pdf link (url) [BibTex]


Thumb xl thumb
Strong Appearance and Expressive Spatial Models for Human Pose Estimation

Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.

In International Conference on Computer Vision (ICCV), pages: 3487 - 3494 , IEEE, December 2013 (inproceedings)

Abstract
Typical approaches to articulated pose estimation combine spatial modelling of the human body with appearance modelling of body parts. This paper aims to push the state-of-the-art in articulated pose estimation in two ways. First we explore various types of appearance representations aiming to substantially improve the body part hypotheses. And second, we draw on and combine several recently proposed powerful ideas such as more flexible spatial models as well as image-conditioned spatial models. In a series of experiments we draw several important conclusions: (1) we show that the proposed appearance representations are complementary; (2) we demonstrate that even a basic tree-structure spatial human body model achieves state-of-the-art performance when augmented with the proper appearance representation; and (3) we show that the combination of the best performing appearance model with a flexible image-conditioned spatial model achieves the best result, significantly improving over the state of the art, on the "Leeds Sports Poses'' and "Parse'' benchmarks.

pdf DOI Project Page [BibTex]

pdf DOI Project Page [BibTex]


Thumb xl screenshot area 2015 07 27 004304
Methods and Applications for Distance Based ANN Training

Lassner, C., Lienhart, R.

In IEEE International Conference on Machine Learning and Applications (ICMLA), December 2013 (inproceedings)

Abstract
Feature learning has the aim to take away the hassle of hand-designing features for machine learning tasks. Since the feature design process is tedious and requires a lot of experience, an automated solution is of great interest. However, an important problem in this field is that usually no objective values are available to fit a feature learning function to. Artificial Neural Networks are a sufficiently flexible tool for function approximation to be able to avoid this problem. We show how the error function of an ANN can be modified such that it works solely with objective distances instead of objective values. We derive the adjusted rules for backpropagation through networks with arbitrary depths and include practical considera- tions that must be taken into account to apply difference based learning successfully. On all three benchmark datasets we use, linear SVMs trained on automatically learned ANN features outperform RBF kernel SVMs trained on the raw data. This can be achieved in a feature space with up to only a tenth of dimensions of the number of original data dimensions. We conclude our work with two experiments on distance based ANN training in two further fields: data visualization and outlier detection.

pdf [BibTex]

pdf [BibTex]


Thumb xl tro
Extracting Postural Synergies for Robotic Grasping

Romero, J., Feix, T., Ek, C., Kjellstrom, H., Kragic, D.

Robotics, IEEE Transactions on, 29(6):1342-1352, December 2013 (article)

[BibTex]

[BibTex]


Thumb xl zhang
Understanding High-Level Semantics by Modeling Traffic Patterns

Zhang, H., Geiger, A., Urtasun, R.

In International Conference on Computer Vision, pages: 3056-3063, Sydney, Australia, December 2013 (inproceedings)

Abstract
In this paper, we are interested in understanding the semantics of outdoor scenes in the context of autonomous driving. Towards this goal, we propose a generative model of 3D urban scenes which is able to reason not only about the geometry and objects present in the scene, but also about the high-level semantics in the form of traffic patterns. We found that a small number of patterns is sufficient to model the vast majority of traffic scenes and show how these patterns can be learned. As evidenced by our experiments, this high-level reasoning significantly improves the overall scene estimation as well as the vehicle-to-lane association when compared to state-of-the-art approaches. All data and code will be made available upon publication.

pdf [BibTex]

pdf [BibTex]


Thumb xl thumb
A Non-parametric Bayesian Network Prior of Human Pose

Lehrmann, A. M., Gehler, P., Nowozin, S.

In Proceedings IEEE Conf. on Computer Vision (ICCV), pages: 1281-1288, December 2013 (inproceedings)

Abstract
Having a sensible prior of human pose is a vital ingredient for many computer vision applications, including tracking and pose estimation. While the application of global non-parametric approaches and parametric models has led to some success, finding the right balance in terms of flexibility and tractability, as well as estimating model parameters from data has turned out to be challenging. In this work, we introduce a sparse Bayesian network model of human pose that is non-parametric with respect to the estimation of both its graph structure and its local distributions. We describe an efficient sampling scheme for our model and show its tractability for the computation of exact log-likelihoods. We empirically validate our approach on the Human 3.6M dataset and demonstrate superior performance to global models and parametric networks. We further illustrate our model's ability to represent and compose poses not present in the training set (compositionality) and describe a speed-accuracy trade-off that allows realtime scoring of poses.

Project page pdf DOI Project Page [BibTex]

Project page pdf DOI Project Page [BibTex]


Thumb xl jhuang
Towards understanding action recognition

Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M. J.

In IEEE International Conference on Computer Vision (ICCV), pages: 3192-3199, IEEE, Sydney, Australia, December 2013 (inproceedings)

Abstract
Although action recognition in videos is widely studied, current methods often fail on real-world datasets. Many recent approaches improve accuracy and robustness to cope with challenging video sequences, but it is often unclear what affects the results most. This paper attempts to provide insights based on a systematic performance evaluation using thoroughly-annotated data of human actions. We annotate human Joints for the HMDB dataset (J-HMDB). This annotation can be used to derive ground truth optical flow and segmentation. We evaluate current methods using this dataset and systematically replace the output of various algorithms with ground truth. This enables us to discover what is important – for example, should we work on improving flow algorithms, estimating human bounding boxes, or enabling pose estimation? In summary, we find that highlevel pose features greatly outperform low/mid level features; in particular, pose over time is critical, but current pose estimation algorithms are not yet reliable enough to provide this information. We also find that the accuracy of a top-performing action recognition framework can be greatly increased by refining the underlying low/mid level features; this suggests it is important to improve optical flow and human detection algorithms. Our analysis and JHMDB dataset should facilitate a deeper understanding of action recognition algorithms.

Website Errata Poster Paper Slides DOI Project Page Project Page Project Page [BibTex]

Website Errata Poster Paper Slides DOI Project Page Project Page Project Page [BibTex]


Thumb xl embs2013
Mixing Decoded Cursor Velocity and Position from an Offline Kalman Filter Improves Cursor Control in People with Tetraplegia

Homer, M., Harrison, M., Black, M. J., Perge, J., Cash, S., Friehs, G., Hochberg, L.

In 6th International IEEE EMBS Conference on Neural Engineering, pages: 715-718, San Diego, November 2013 (inproceedings)

Abstract
Kalman filtering is a common method to decode neural signals from the motor cortex. In clinical research investigating the use of intracortical brain computer interfaces (iBCIs), the technique enabled people with tetraplegia to control assistive devices such as a computer or robotic arm directly from their neural activity. For reaching movements, the Kalman filter typically estimates the instantaneous endpoint velocity of the control device. Here, we analyzed attempted arm/hand movements by people with tetraplegia to control a cursor on a computer screen to reach several circular targets. A standard velocity Kalman filter is enhanced to additionally decode for the cursor’s position. We then mix decoded velocity and position to generate cursor movement commands. We analyzed data, offline, from two participants across six sessions. Root mean squared error between the actual and estimated cursor trajectory improved by 12.2 ±10.5% (pairwise t-test, p<0.05) as compared to a standard velocity Kalman filter. The findings suggest that simultaneously decoding for intended velocity and position and using them both to generate movement commands can improve the performance of iBCIs.

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Thumb xl pic cviu13
Markov Random Field Modeling, Inference & Learning in Computer Vision & Image Understanding: A Survey

Wang, C., Komodakis, N., Paragios, N.

Computer Vision and Image Understanding (CVIU), 117(11):1610-1627, November 2013 (article)

Abstract
In this paper, we present a comprehensive survey of Markov Random Fields (MRFs) in computer vision and image understanding, with respect to the modeling, the inference and the learning. While MRFs were introduced into the computer vision field about two decades ago, they started to become a ubiquitous tool for solving visual perception problems around the turn of the millennium following the emergence of efficient inference methods. During the past decade, a variety of MRF models as well as inference and learning methods have been developed for addressing numerous low, mid and high-level vision problems. While most of the literature concerns pairwise MRFs, in recent years we have also witnessed significant progress in higher-order MRFs, which substantially enhances the expressiveness of graph-based models and expands the domain of solvable problems. This survey provides a compact and informative summary of the major literature in this research topic.

Publishers site pdf [BibTex]

Publishers site pdf [BibTex]


no image
Multi-robot cooperative spherical-object tracking in 3D space based on particle filters

Ahmad, A., Lima, P.

Robotics and Autonomous Systems, 61(10):1084-1093, October 2013 (article)

Abstract
This article presents a cooperative approach for tracking a moving spherical object in 3D space by a team of mobile robots equipped with sensors, in a highly dynamic environment. The tracker’s core is a particle filter, modified to handle, within a single unified framework, the problem of complete or partial occlusion for some of the involved mobile sensors, as well as inconsistent estimates in the global frame among sensors, due to observation errors and/or self-localization uncertainty. We present results supporting our approach by applying it to a team of real soccer robots tracking a soccer ball, including comparison with ground truth.

DOI [BibTex]

DOI [BibTex]


no image
Multi-Robot Cooperative Object Tracking Based on Particle Filters

Ahmad, A., Lima, P.

In Robotics and Autonomous Systems, 61(10):1084-1093, October 2013 (inproceedings)

Abstract
This article presents a cooperative approach for tracking a moving object by a team of mobile robots equipped with sensors, in a highly dynamic environment. The tracker’s core is a particle filter, modified to handle, within a single unified framework, the problem of complete or partial occlusion for some of the involved mobile sensors, as well as inconsistent estimates in the global frame among sensors, due to observation errors and/or self-localization uncertainty. We present results supporting our approach by applying it to a team of real soccer robots tracking a soccer ball.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Thumb xl implied flow whue
Puppet Flow

Zuffi, S., Black, M. J.

(7), Max Planck Institute for Intelligent Systems, October 2013 (techreport)

Abstract
We introduce Puppet Flow (PF), a layered model describing the optical flow of a person in a video sequence. We consider video frames composed by two layers: a foreground layer corresponding to a person, and background. We model the background as an affine flow field. The foreground layer, being a moving person, requires reasoning about the articulated nature of the human body. We thus represent the foreground layer with the Deformable Structures model (DS), a parametrized 2D part-based human body representation. We call the motion field defined through articulated motion and deformation of the DS model, a Puppet Flow. By exploiting the DS representation, Puppet Flow is a parametrized optical flow field, where parameters are the person's pose, gender and body shape.

pdf Project Page Project Page [BibTex]

pdf Project Page Project Page [BibTex]


no image
D2.1.4 RoCKIn@Work - Innovation in Mobile Industrial Manipulation Competition Design, Rule Book, and Scenario Construction

Ahmad, A., Awaad, I., Amigoni, F., Berghofer, J., Bischoff, R., Bonarini, A., Dwiputra, R., Hegger, F., Hochgeschwender, N., Iocchi, L., Kraetzschmar, G., Lima, P., Matteucci, M., Nardi, D., Schneider, S.

(FP7-ICT-601012 Revision 0.7), RoCKIn - Robot Competitions Kick Innovation in Cognitive Systems and Robotics, sep 2013 (techreport)

Abstract
RoCKIn is a EU-funded project aiming to foster scientific progress and innovation in cognitive systems and robotics through the design and implementation of competitions. An additional objective of RoCKIn is to increase public awareness of the current state-of-the-art in robotics in Europe and to demonstrate the innovation potential of robotics applications for solving societal challenges and improving the competitiveness of Europe in the global markets. In order to achieve these objectives, RoCKIn develops two competitions, one for domestic service robots (RoCKIn@Home) and one for industrial robots in factories (RoCKIn-@Work). These competitions are designed around challenges that are based on easy-to-communicate and convincing user stories, which catch the interest of both the general public and the scientifc community. The latter is in particular interested in solving open scientific challenges and to thoroughly assess, compare, and evaluate the developed approaches with competing ones. To allow this to happen, the competitions are designed to meet the requirements of benchmarking procedures and good experimental methods. The integration of benchmarking technology with the competition concept is one of the main objectives of RoCKIn. This document describes the first version of the RoCKIn@Work competition, which will be held for the first time in 2014. The first chapter of the document gives a brief overview, outlining the purpose and objective of the competition, the methodological approach taken by the RoCKIn project, the user story upon which the competition is based, the structure and organization of the competition, and the commonalities and differences with the RoboCup@Work competition, which served as inspiration for RoCKIn@Work. The second chapter provides details on the user story and analyzes the scientific and technical challenges it poses. Consecutive chapters detail the competition scenario, the competition design, and the organization of the competition. The appendices contain information on a library of functionalities, which we believe are needed, or at least useful, for building competition entries, details on the scenario construction, and a detailed account of the benchmarking infrastructure needed — and provided by RoCKIn.

[BibTex]

[BibTex]


no image
D2.1.1 RoCKIn@Home - A Competition for Domestic Service Robots Competition Design, Rule Book, and Scenario Construction

Ahmad, A., Awaad, I., Amigoni, F., Berghofer, J., Bischoff, R., Bonarini, A., Dwiputra, R., Hegger, F., Hochgeschwender, N., Iocchi, L., Kraetzschmar, G., Lima, P., Matteucci, M., Nardi, D., Schneider, S.

(FP7-ICT-601012 Revision 0.7), RoCKIn - Robot Competitions Kick Innovation in Cognitive Systems and Robotics, sep 2013 (techreport)

Abstract
RoCKIn is a EU-funded project aiming to foster scientific progress and innovation in cognitive systems and robotics through the design and implementation of competitions. An additional objective of RoCKIn is to increase public awareness of the current state-of-the-art in robotics in Europe and to demonstrate the innovation potential of robotics applications for solving societal challenges and improving the competitiveness of Europe in the global markets. In order to achieve these objectives, RoCKIn develops two competitions, one for domestic service robots (RoCKIn@Home) and one for industrial robots in factories (RoCKIn-@Work). These competitions are designed around challenges that are based on easy-to-communicate and convincing user stories, which catch the interest of both the general public and the scientifc community. The latter is in particular interested in solving open scientific challenges and to thoroughly assess, compare, and evaluate the developed approaches with competing ones. To allow this to happen, the competitions are designed to meet the requirements of benchmarking procedures and good experimental methods. The integration of benchmarking technology with the competition concept is one of the main objectives of RoCKIn. This document describes the first version of the RoCKIn@Home competition, which will be held for the first time in 2014. The first chapter of the document gives a brief overview, outlining the purpose and objective of the competition, the methodological approach taken by the RoCKIn project, the user story upon which the competition is based, the structure and organization of the competition, and the commonalities and differences with the RoboCup@Home competition, which served as inspiration for RoCKIn@Home. The second chapter provides details on the user story and analyzes the scientific and technical challenges it poses. Consecutive chapters detail the competition scenario, the competition design, and the organization of the competition. The appendices contain information on a library of functionalities, which we believe are needed, or at least useful, for building competition entries, details on the scenario construction, and a detailed account of the benchmarking infrastructure needed — and provided by RoCKIn.

[BibTex]

[BibTex]


Thumb xl bmvc teaser
Distribution Fields with Adaptive Kernels for Large Displacement Image Alignment

Mears, B., Sevilla-Lara, L., Learned-Miller, E.

In British Machine Vision Conference (BMVC) , BMVA Press, September 2013 (inproceedings)

Abstract
While region-based image alignment algorithms that use gradient descent can achieve sub-pixel accuracy when they converge, their convergence depends on the smoothness of the image intensity values. Image smoothness is often enforced through the use of multiscale approaches in which images are smoothed and downsampled. Yet, these approaches typically use fixed smoothing parameters which may be appropriate for some images but not for others. Even for a particular image, the optimal smoothing parameters may depend on the magnitude of the transformation. When the transformation is large, the image should be smoothed more than when the transformation is small. Further, with gradient-based approaches, the optimal smoothing parameters may change with each iteration as the algorithm proceeds towards convergence. We address convergence issues related to the choice of smoothing parameters by deriving a Gauss-Newton gradient descent algorithm based on distribution fields (DFs) and proposing a method to dynamically select smoothing parameters at each iteration. DF and DF-like representations have previously been used in the context of tracking. In this work we incorporate DFs into a full affine model for region-based alignment and simultaneously search over parameterized sets of geometric and photometric transforms. We use a probabilistic interpretation of DFs to select smoothing parameters at each step in the optimization and show that this results in improved convergence rates.

pdf code [BibTex]

pdf code [BibTex]


Thumb xl teaser mrg
Metric Regression Forests for Human Pose Estimation

(Best Science Paper Award)

Pons-Moll, G., Taylor, J., Shotton, J., Hertzmann, A., Fitzgibbon, A.

In British Machine Vision Conference (BMVC) , BMVA Press, September 2013 (inproceedings)

pdf [BibTex]

pdf [BibTex]


Thumb xl ijrr
Vision meets Robotics: The KITTI Dataset

Geiger, A., Lenz, P., Stiller, C., Urtasun, R.

International Journal of Robotics Research, 32(11):1231 - 1237 , Sage Publishing, September 2013 (article)

Abstract
We present a novel dataset captured from a VW station wagon for use in mobile robotics and autonomous driving research. In total, we recorded 6 hours of traffic scenarios at 10-100 Hz using a variety of sensor modalities such as high-resolution color and grayscale stereo cameras, a Velodyne 3D laser scanner and a high-precision GPS/IMU inertial navigation system. The scenarios are diverse, capturing real-world traffic situations and range from freeways over rural areas to inner-city scenes with many static and dynamic objects. Our data is calibrated, synchronized and timestamped, and we provide the rectified and raw image sequences. Our dataset also contains object labels in the form of 3D tracklets and we provide online benchmarks for stereo, optical flow, object detection and other tasks. This paper describes our recording platform, the data format and the utilities that we provide.

pdf DOI [BibTex]

pdf DOI [BibTex]


Thumb xl imgf0006
Human Pose Calculation from Optical Flow Data

Black, M., Loper, M., Romero, J., Zuffi, S.

European Patent Application EP 2843621 , August 2013 (patent)

Google Patents [BibTex]

Google Patents [BibTex]