Header logo is ps


2017


Thumb xl flamewebteaserwide
Learning a model of facial shape and expression from 4D scans

Li, T., Bolkart, T., Black, M. J., Li, H., Romero, J.

ACM Transactions on Graphics, 36(6):194:1-194:17, November 2017, Two first authors contributed equally (article)

Abstract
The field of 3D face modeling has a large gap between high-end and low-end methods. At the high end, the best facial animation is indistinguishable from real humans, but this comes at the cost of extensive manual labor. At the low end, face capture from consumer depth sensors relies on 3D face models that are not expressive enough to capture the variability in natural facial shape and expression. We seek a middle ground by learning a facial model from thousands of accurately aligned 3D scans. Our FLAME model (Faces Learned with an Articulated Model and Expressions) is designed to work with existing graphics software and be easy to fit to data. FLAME uses a linear shape space trained from 3800 scans of human heads. FLAME combines this linear shape space with an articulated jaw, neck, and eyeballs, pose-dependent corrective blendshapes, and additional global expression from 4D face sequences in the D3DFACS dataset along with additional 4D sequences.We accurately register a template mesh to the scan sequences and make the D3DFACS registrations available for research purposes. In total the model is trained from over 33, 000 scans. FLAME is low-dimensional but more expressive than the FaceWarehouse model and the Basel Face Model. We compare FLAME to these models by fitting them to static 3D scans and 4D sequences using the same optimization method. FLAME is significantly more accurate and is available for research purposes (http://flame.is.tue.mpg.de).

data/model video code chumpy code tensorflow paper supplemental Project Page [BibTex]

2017

data/model video code chumpy code tensorflow paper supplemental Project Page [BibTex]


Thumb xl molbert
Investigating Body Image Disturbance in Anorexia Nervosa Using Novel Biometric Figure Rating Scales: A Pilot Study

Mölbert, S. C., Thaler, A., Streuber, S., Black, M. J., Karnath, H., Zipfel, S., Mohler, B., Giel, K. E.

European Eating Disorders Review, 25(6):607-612, November 2017 (article)

Abstract
This study uses novel biometric figure rating scales (FRS) spanning body mass index (BMI) 13.8 to 32.2 kg/m2 and BMI 18 to 42 kg/m2. The aims of the study were (i) to compare FRS body weight dissatisfaction and perceptual distortion of women with anorexia nervosa (AN) to a community sample; (ii) how FRS parameters are associated with questionnaire body dissatisfaction, eating disorder symptoms and appearance comparison habits; and (iii) whether the weight spectrum of the FRS matters. Women with AN (n = 24) and a community sample of women (n = 104) selected their current and ideal body on the FRS and completed additional questionnaires. Women with AN accurately picked the body that aligned best with their actual weight in both FRS. Controls underestimated their BMI in the FRS 14–32 and were accurate in the FRS 18–42. In both FRS, women with AN desired a body close to their actual BMI and controls desired a thinner body. Our observations suggest that body image disturbance in AN is unlikely to be characterized by a visual perceptual disturbance, but rather by an idealization of underweight in conjunction with high body dissatisfaction. The weight spectrum of FRS can influence the accuracy of BMI estimation.

publisher DOI Project Page [BibTex]


Thumb xl manoteaser
Embodied Hands: Modeling and Capturing Hands and Bodies Together

Romero, J., Tzionas, D., Black, M. J.

ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6):245:1-245:17, 245:1–245:17, ACM, November 2017 (article)

Abstract
Humans move their hands and bodies together to communicate and solve tasks. Capturing and replicating such coordinated activity is critical for virtual characters that behave realistically. Surprisingly, most methods treat the 3D modeling and tracking of bodies and hands separately. Here we formulate a model of hands and bodies interacting together and fit it to full-body 4D sequences. When scanning or capturing the full body in 3D, hands are small and often partially occluded, making their shape and pose hard to recover. To cope with low-resolution, occlusion, and noise, we develop a new model called MANO (hand Model with Articulated and Non-rigid defOrmations). MANO is learned from around 1000 high-resolution 3D scans of hands of 31 subjects in a wide variety of hand poses. The model is realistic, low-dimensional, captures non-rigid shape changes with pose, is compatible with standard graphics packages, and can fit any human hand. MANO provides a compact mapping from hand poses to pose blend shape corrections and a linear manifold of pose synergies. We attach MANO to a standard parameterized 3D body shape model (SMPL), resulting in a fully articulated body and hand model (SMPL+H). We illustrate SMPL+H by fitting complex, natural, activities of subjects captured with a 4D scanner. The fitting is fully automatic and results in full body models that move naturally with detailed hand motions and a realism not seen before in full body performance capture. The models and data are freely available for research purposes at http://mano.is.tue.mpg.de.

website youtube paper suppl video link (url) DOI Project Page [BibTex]

website youtube paper suppl video link (url) DOI Project Page [BibTex]


Thumb xl cover tro paper
An Online Scalable Approach to Unified Multirobot Cooperative Localization and Object Tracking

Ahmad, A., Lawless, G., Lima, P.

IEEE Transactions on Robotics (T-RO), 33, pages: 1184 - 1199, October 2017 (article)

Abstract
In this article we present a unified approach for multi-robot cooperative simultaneous localization and object tracking based on particle filters. Our approach is scalable with respect to the number of robots in the team. We introduce a method that reduces, from an exponential to a linear growth, the space and computation time requirements with respect to the number of robots in order to maintain a given level of accuracy in the full state estimation. Our method requires no increase in the number of particles with respect to the number of robots. However, in our method each particle represents a full state hypothesis, leading to the linear dependency on the number of robots of both space and time complexity. The derivation of the algorithm implementing our approach from a standard particle filter algorithm and its complexity analysis are presented. Through an extensive set of simulation experiments on a large number of randomized datasets, we demonstrate the correctness and efficacy of our approach. Through real robot experiments on a standardized open dataset of a team of four soccer playing robots tracking a ball, we evaluate our method's estimation accuracy with respect to the ground truth values. Through comparisons with other methods based on i) nonlinear least squares minimization and ii) joint extended Kalman filter, we further highlight our method's advantages. Finally, we also present a robustness test for our approach by evaluating it under scenarios of communication and vision failure in teammate robots.

Published Version link (url) DOI [BibTex]


Thumb xl early stopping teaser
Early Stopping Without a Validation Set

Mahsereci, M., Balles, L., Lassner, C., Hennig, P.

arXiv preprint arXiv:1703.09580, 2017 (article)

Abstract
Early stopping is a widely used technique to prevent poor generalization performance when training an over-expressive model by means of gradient-based optimization. To find a good point to halt the optimizer, a common practice is to split the dataset into a training and a smaller validation set to obtain an ongoing estimate of the generalization performance. In this paper we propose a novel early stopping criterion which is based on fast-to-compute, local statistics of the computed gradients and entirely removes the need for a held-out validation set. Our experiments show that this is a viable approach in the setting of least-squares and logistic regression as well as neural networks.

link (url) Project Page Project Page [BibTex]


Thumb xl appealingavatars
Appealing Avatars from 3D Body Scans: Perceptual Effects of Stylization

Fleming, R., Mohler, B. J., Romero, J., Black, M. J., Breidt, M.

In Computer Vision, Imaging and Computer Graphics Theory and Applications: 11th International Joint Conference, VISIGRAPP 2016, Rome, Italy, February 27 – 29, 2016, Revised Selected Papers, pages: 175-196, Springer International Publishing, 2017 (inbook)

Abstract
Using styles derived from existing popular character designs, we present a novel automatic stylization technique for body shape and colour information based on a statistical 3D model of human bodies. We investigate whether such stylized body shapes result in increased perceived appeal with two different experiments: One focuses on body shape alone, the other investigates the additional role of surface colour and lighting. Our results consistently show that the most appealing avatar is a partially stylized one. Importantly, avatars with high stylization or no stylization at all were rated to have the least appeal. The inclusion of colour information and improvements to render quality had no significant effect on the overall perceived appeal of the avatars, and we observe that the body shape primarily drives the change in appeal ratings. For body scans with colour information, we found that a partially stylized avatar was perceived as most appealing.

publisher site pdf DOI [BibTex]

publisher site pdf DOI [BibTex]


Thumb xl gcpr2017 nugget
Learning to Filter Object Detections

Prokudin, S., Kappler, D., Nowozin, S., Gehler, P.

In Pattern Recognition: 39th German Conference, GCPR 2017, Basel, Switzerland, September 12–15, 2017, Proceedings, pages: 52-62, Springer International Publishing, Cham, 2017 (inbook)

Abstract
Most object detection systems consist of three stages. First, a set of individual hypotheses for object locations is generated using a proposal generating algorithm. Second, a classifier scores every generated hypothesis independently to obtain a multi-class prediction. Finally, all scored hypotheses are filtered via a non-differentiable and decoupled non-maximum suppression (NMS) post-processing step. In this paper, we propose a filtering network (FNet), a method which replaces NMS with a differentiable neural network that allows joint reasoning and re-scoring of the generated set of hypotheses per image. This formulation enables end-to-end training of the full object detection pipeline. First, we demonstrate that FNet, a feed-forward network architecture, is able to mimic NMS decisions, despite the sequential nature of NMS. We further analyze NMS failures and propose a loss formulation that is better aligned with the mean average precision (mAP) evaluation metric. We evaluate FNet on several standard detection datasets. Results surpass standard NMS on highly occluded settings of a synthetic overlapping MNIST dataset and show competitive behavior on PascalVOC2007 and KITTI detection benchmarks.

Paper link (url) DOI Project Page [BibTex]

Paper link (url) DOI Project Page [BibTex]


Thumb xl web image
Data-Driven Physics for Human Soft Tissue Animation

Kim, M., Pons-Moll, G., Pujades, S., Bang, S., Kim, J., Black, M. J., Lee, S.

ACM Transactions on Graphics, (Proc. SIGGRAPH), 36(4):54:1-54:12, 2017 (article)

Abstract
Data driven models of human poses and soft-tissue deformations can produce very realistic results, but they only model the visible surface of the human body and cannot create skin deformation due to interactions with the environment. Physical simulations can generalize to external forces, but their parameters are difficult to control. In this paper, we present a layered volumetric human body model learned from data. Our model is composed of a data-driven inner layer and a physics-based external layer. The inner layer is driven with a volumetric statistical body model (VSMPL). The soft tissue layer consists of a tetrahedral mesh that is driven using the finite element method (FEM). Model parameters, namely the segmentation of the body into layers and the soft tissue elasticity, are learned directly from 4D registrations of humans exhibiting soft tissue deformations. The learned two layer model is a realistic full-body avatar that generalizes to novel motions and external forces. Experiments show that the resulting avatars produce realistic results on held out sequences and react to external forces. Moreover, the model supports the retargeting of physical properties from one avatar when they share the same topology.

video paper link (url) Project Page [BibTex]

video paper link (url) Project Page [BibTex]


Thumb xl web teaser eg
Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs

(Best Paper, Eurographics 2017)

Marcard, T. V., Rosenhahn, B., Black, M., Pons-Moll, G.

Computer Graphics Forum 36(2), Proceedings of the 38th Annual Conference of the European Association for Computer Graphics (Eurographics), pages: 349-360 , 2017 (article)

Abstract
We address the problem of making human motion capture in the wild more practical by using a small set of inertial sensors attached to the body. Since the problem is heavily under-constrained, previous methods either use a large number of sensors, which is intrusive, or they require additional video input. We take a different approach and constrain the problem by: (i) making use of a realistic statistical body model that includes anthropometric constraints and (ii) using a joint optimization framework to fit the model to orientation and acceleration measurements over multiple frames. The resulting tracker Sparse Inertial Poser (SIP) enables motion capture using only 6 sensors (attached to the wrists, lower legs, back and head) and works for arbitrary human motions. Experiments on the recently released TNT15 dataset show that, using the same number of sensors, SIP achieves higher accuracy than the dataset baseline without using any video data. We further demonstrate the effectiveness of SIP on newly recorded challenging motions in outdoor scenarios such as climbing or jumping over a wall

video pdf Project Page [BibTex]

video pdf Project Page [BibTex]


Thumb xl pami 2017 teaser
Efficient 2D and 3D Facade Segmentation using Auto-Context

Gadde, R., Jampani, V., Marlet, R., Gehler, P.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017 (article)

Abstract
This paper introduces a fast and efficient segmentation technique for 2D images and 3D point clouds of building facades. Facades of buildings are highly structured and consequently most methods that have been proposed for this problem aim to make use of this strong prior information. Contrary to most prior work, we are describing a system that is almost domain independent and consists of standard segmentation methods. We train a sequence of boosted decision trees using auto-context features. This is learned using stacked generalization. We find that this technique performs better, or comparable with all previous published methods and present empirical results on all available 2D and 3D facade benchmark datasets. The proposed method is simple to implement, easy to extend, and very efficient at test-time inference.

arXiv Project Page [BibTex]

arXiv Project Page [BibTex]


Thumb xl web image
ClothCap: Seamless 4D Clothing Capture and Retargeting

Pons-Moll, G., Pujades, S., Hu, S., Black, M.

ACM Transactions on Graphics, (Proc. SIGGRAPH), 36(4):73:1-73:15, ACM, New York, NY, USA, 2017, Two first authors contributed equally (article)

Abstract
Designing and simulating realistic clothing is challenging and, while several methods have addressed the capture of clothing from 3D scans, previous methods have been limited to single garments and simple motions, lack detail, or require specialized texture patterns. Here we address the problem of capturing regular clothing on fully dressed people in motion. People typically wear multiple pieces of clothing at a time. To estimate the shape of such clothing, track it over time, and render it believably, each garment must be segmented from the others and the body. Our ClothCap approach uses a new multi-part 3D model of clothed bodies, automatically segments each piece of clothing, estimates the naked body shape and pose under the clothing, and tracks the 3D deformations of the clothing over time. We estimate the garments and their motion from 4D scans; that is, high-resolution 3D scans of the subject in motion at 60 fps. The model allows us to capture a clothed person in motion, extract their clothing, and retarget the clothing to new body shapes. ClothCap provides a step towards virtual try-on with a technology for capturing, modeling, and analyzing clothing in motion.

video project_page paper link (url) DOI Project Page Project Page [BibTex]

video project_page paper link (url) DOI Project Page Project Page [BibTex]


Thumb xl auroteaser
Decentralized Simultaneous Multi-target Exploration using a Connected Network of Multiple Robots

Nestmeyer, T., Robuffo Giordano, P., Bülthoff, H. H., Franchi, A.

In pages: 989-1011, Autonomous Robots, 2017 (incollection)

[BibTex]

[BibTex]

2013


Thumb xl thumb
Branch&Rank for Efficient Object Detection

Lehmann, A., Gehler, P., VanGool, L.

International Journal of Computer Vision, Springer, December 2013 (article)

Abstract
Ranking hypothesis sets is a powerful concept for efficient object detection. In this work, we propose a branch&rank scheme that detects objects with often less than 100 ranking operations. This efficiency enables the use of strong and also costly classifiers like non-linear SVMs with RBF-TeX kernels. We thereby relieve an inherent limitation of branch&bound methods as bounds are often not tight enough to be effective in practice. Our approach features three key components: a ranking function that operates on sets of hypotheses and a grouping of these into different tasks. Detection efficiency results from adaptively sub-dividing the object search space into decreasingly smaller sets. This is inherited from branch&bound, while the ranking function supersedes a tight bound which is often unavailable (except for rather limited function classes). The grouping makes the system effective: it separates image classification from object recognition, yet combines them in a single formulation, phrased as a structured SVM problem. A novel aspect of branch&rank is that a better ranking function is expected to decrease the number of classifier calls during detection. We use the VOC’07 dataset to demonstrate the algorithmic properties of branch&rank.

pdf link (url) [BibTex]

2013

pdf link (url) [BibTex]


Thumb xl tro
Extracting Postural Synergies for Robotic Grasping

Romero, J., Feix, T., Ek, C., Kjellstrom, H., Kragic, D.

Robotics, IEEE Transactions on, 29(6):1342-1352, December 2013 (article)

[BibTex]

[BibTex]


Thumb xl pic cviu13
Markov Random Field Modeling, Inference & Learning in Computer Vision & Image Understanding: A Survey

Wang, C., Komodakis, N., Paragios, N.

Computer Vision and Image Understanding (CVIU), 117(11):1610-1627, November 2013 (article)

Abstract
In this paper, we present a comprehensive survey of Markov Random Fields (MRFs) in computer vision and image understanding, with respect to the modeling, the inference and the learning. While MRFs were introduced into the computer vision field about two decades ago, they started to become a ubiquitous tool for solving visual perception problems around the turn of the millennium following the emergence of efficient inference methods. During the past decade, a variety of MRF models as well as inference and learning methods have been developed for addressing numerous low, mid and high-level vision problems. While most of the literature concerns pairwise MRFs, in recent years we have also witnessed significant progress in higher-order MRFs, which substantially enhances the expressiveness of graph-based models and expands the domain of solvable problems. This survey provides a compact and informative summary of the major literature in this research topic.

Publishers site pdf [BibTex]

Publishers site pdf [BibTex]


no image
Multi-robot cooperative spherical-object tracking in 3D space based on particle filters

Ahmad, A., Lima, P.

Robotics and Autonomous Systems, 61(10):1084-1093, October 2013 (article)

Abstract
This article presents a cooperative approach for tracking a moving spherical object in 3D space by a team of mobile robots equipped with sensors, in a highly dynamic environment. The tracker’s core is a particle filter, modified to handle, within a single unified framework, the problem of complete or partial occlusion for some of the involved mobile sensors, as well as inconsistent estimates in the global frame among sensors, due to observation errors and/or self-localization uncertainty. We present results supporting our approach by applying it to a team of real soccer robots tracking a soccer ball, including comparison with ground truth.

DOI [BibTex]

DOI [BibTex]


Thumb xl implied flow whue
Puppet Flow

Zuffi, S., Black, M. J.

(7), Max Planck Institute for Intelligent Systems, October 2013 (techreport)

Abstract
We introduce Puppet Flow (PF), a layered model describing the optical flow of a person in a video sequence. We consider video frames composed by two layers: a foreground layer corresponding to a person, and background. We model the background as an affine flow field. The foreground layer, being a moving person, requires reasoning about the articulated nature of the human body. We thus represent the foreground layer with the Deformable Structures model (DS), a parametrized 2D part-based human body representation. We call the motion field defined through articulated motion and deformation of the DS model, a Puppet Flow. By exploiting the DS representation, Puppet Flow is a parametrized optical flow field, where parameters are the person's pose, gender and body shape.

pdf Project Page Project Page [BibTex]

pdf Project Page Project Page [BibTex]


no image
D2.1.4 RoCKIn@Work - Innovation in Mobile Industrial Manipulation Competition Design, Rule Book, and Scenario Construction

Ahmad, A., Awaad, I., Amigoni, F., Berghofer, J., Bischoff, R., Bonarini, A., Dwiputra, R., Hegger, F., Hochgeschwender, N., Iocchi, L., Kraetzschmar, G., Lima, P., Matteucci, M., Nardi, D., Schneider, S.

(FP7-ICT-601012 Revision 0.7), RoCKIn - Robot Competitions Kick Innovation in Cognitive Systems and Robotics, sep 2013 (techreport)

Abstract
RoCKIn is a EU-funded project aiming to foster scientific progress and innovation in cognitive systems and robotics through the design and implementation of competitions. An additional objective of RoCKIn is to increase public awareness of the current state-of-the-art in robotics in Europe and to demonstrate the innovation potential of robotics applications for solving societal challenges and improving the competitiveness of Europe in the global markets. In order to achieve these objectives, RoCKIn develops two competitions, one for domestic service robots (RoCKIn@Home) and one for industrial robots in factories (RoCKIn-@Work). These competitions are designed around challenges that are based on easy-to-communicate and convincing user stories, which catch the interest of both the general public and the scientifc community. The latter is in particular interested in solving open scientific challenges and to thoroughly assess, compare, and evaluate the developed approaches with competing ones. To allow this to happen, the competitions are designed to meet the requirements of benchmarking procedures and good experimental methods. The integration of benchmarking technology with the competition concept is one of the main objectives of RoCKIn. This document describes the first version of the RoCKIn@Work competition, which will be held for the first time in 2014. The first chapter of the document gives a brief overview, outlining the purpose and objective of the competition, the methodological approach taken by the RoCKIn project, the user story upon which the competition is based, the structure and organization of the competition, and the commonalities and differences with the RoboCup@Work competition, which served as inspiration for RoCKIn@Work. The second chapter provides details on the user story and analyzes the scientific and technical challenges it poses. Consecutive chapters detail the competition scenario, the competition design, and the organization of the competition. The appendices contain information on a library of functionalities, which we believe are needed, or at least useful, for building competition entries, details on the scenario construction, and a detailed account of the benchmarking infrastructure needed — and provided by RoCKIn.

[BibTex]

[BibTex]


no image
D2.1.1 RoCKIn@Home - A Competition for Domestic Service Robots Competition Design, Rule Book, and Scenario Construction

Ahmad, A., Awaad, I., Amigoni, F., Berghofer, J., Bischoff, R., Bonarini, A., Dwiputra, R., Hegger, F., Hochgeschwender, N., Iocchi, L., Kraetzschmar, G., Lima, P., Matteucci, M., Nardi, D., Schneider, S.

(FP7-ICT-601012 Revision 0.7), RoCKIn - Robot Competitions Kick Innovation in Cognitive Systems and Robotics, sep 2013 (techreport)

Abstract
RoCKIn is a EU-funded project aiming to foster scientific progress and innovation in cognitive systems and robotics through the design and implementation of competitions. An additional objective of RoCKIn is to increase public awareness of the current state-of-the-art in robotics in Europe and to demonstrate the innovation potential of robotics applications for solving societal challenges and improving the competitiveness of Europe in the global markets. In order to achieve these objectives, RoCKIn develops two competitions, one for domestic service robots (RoCKIn@Home) and one for industrial robots in factories (RoCKIn-@Work). These competitions are designed around challenges that are based on easy-to-communicate and convincing user stories, which catch the interest of both the general public and the scientifc community. The latter is in particular interested in solving open scientific challenges and to thoroughly assess, compare, and evaluate the developed approaches with competing ones. To allow this to happen, the competitions are designed to meet the requirements of benchmarking procedures and good experimental methods. The integration of benchmarking technology with the competition concept is one of the main objectives of RoCKIn. This document describes the first version of the RoCKIn@Home competition, which will be held for the first time in 2014. The first chapter of the document gives a brief overview, outlining the purpose and objective of the competition, the methodological approach taken by the RoCKIn project, the user story upon which the competition is based, the structure and organization of the competition, and the commonalities and differences with the RoboCup@Home competition, which served as inspiration for RoCKIn@Home. The second chapter provides details on the user story and analyzes the scientific and technical challenges it poses. Consecutive chapters detail the competition scenario, the competition design, and the organization of the competition. The appendices contain information on a library of functionalities, which we believe are needed, or at least useful, for building competition entries, details on the scenario construction, and a detailed account of the benchmarking infrastructure needed — and provided by RoCKIn.

[BibTex]

[BibTex]


Thumb xl ijrr
Vision meets Robotics: The KITTI Dataset

Geiger, A., Lenz, P., Stiller, C., Urtasun, R.

International Journal of Robotics Research, 32(11):1231 - 1237 , Sage Publishing, September 2013 (article)

Abstract
We present a novel dataset captured from a VW station wagon for use in mobile robotics and autonomous driving research. In total, we recorded 6 hours of traffic scenarios at 10-100 Hz using a variety of sensor modalities such as high-resolution color and grayscale stereo cameras, a Velodyne 3D laser scanner and a high-precision GPS/IMU inertial navigation system. The scenarios are diverse, capturing real-world traffic situations and range from freeways over rural areas to inner-city scenes with many static and dynamic objects. Our data is calibrated, synchronized and timestamped, and we provide the rectified and raw image sequences. Our dataset also contains object labels in the form of 3D tracklets and we provide online benchmarks for stereo, optical flow, object detection and other tasks. This paper describes our recording platform, the data format and the utilities that we provide.

pdf DOI [BibTex]

pdf DOI [BibTex]


no image
D1.1 Specification of General Features of Scenarios and Robots for Benchmarking Through Competitions

Ahmad, A., Awaad, I., Amigoni, F., Berghofer, J., Bischoff, R., Bonarini, A., Dwiputra, R., Fontana, G., Hegger, F., Hochgeschwender, N., Iocchi, L., Kraetzschmar, G., Lima, P., Matteucci, M., Nardi, D., Schiaffonati, V., Schneider, S.

(FP7-ICT-601012 Revision 1.0), RoCKIn - Robot Competitions Kick Innovation in Cognitive Systems and Robotics, July 2013 (techreport)

Abstract
RoCKIn is a EU-funded project aiming to foster scientific progress and innovation in cognitive systems and robotics through the design and implementation of competitions. An additional objective of RoCKIn is to increase public awareness of the current state-of-the-art in robotics and the innovation potential of robotics applications. From these objectives several requirements for the work performed in RoCKIn can be derived: The RoCKIn competitions must start from convincing, easy-to-communicate user stories, that catch the attention of relevant stakeholders, the media, and the crowd. The user stories play the role of a mid- to long-term vision for a competition. Preferably, the user stories address economic, societal, or environmental problems. The RoCKIn competitions must pose open scientific challenges of interest to sufficiently many researchers to attract existing and new teams of robotics researchers for participation in the competition. The competitions need to promise some suitable reward, such as recognition in the scientific community, publicity for a team’s work, awards, or prize money, to justify the effort a team puts into the development of a competition entry. The competitions should be designed in such a way that they reward general, scientifically sound solutions to the challenge problems; such general solutions should score better than approaches that work only in narrowly defined contexts and are considred over-engineered. The challenges motivating the RoCKIn competitions must be broken down into suitable intermediate goals that can be reached with a limited team effort until the next competition and the project duration. The RoCKIn competitions must be well-defined and well-designed, with comprehensive rule books and instructions for the participants in order to guarantee a fair competition. The RoCKIn competitions must integrate competitions with benchmarking in order to provide comprehensive feedback for the teams about the suitability of particular functional modules, their overall architecture, and system integration. This document takes the first steps towards the RoCKIn goals. After outlining our approach, we present several user stories for further discussion within the community. The main objectives of this document are to identify and document relevant scenario features and the tasks and functionalities subject for benchmarking in the competitions.

[BibTex]

[BibTex]


no image
SocRob-MSL 2013 Team Description Paper for Middle Sized League

Messias, J., Ahmad, A., Reis, J., Serafim, M., Lima, P.

17th Annual RoboCup International Symposium 2013, July 2013 (techreport)

Abstract
This paper describes the status of the SocRob MSL robotic soccer team as required by the RoboCup 2013 qualification procedures. The team’s latest scientific and technical developments, since its last participation in RoboCup MSL, include further advances in cooperative perception; novel communication methods for distributed robotics; progressive deployment of the ROS middleware; improved localization through feature tracking and Mixture MCL; novel planning methods based on Petri nets and decision-theoretic frameworks; and hardware developments in ball-handling/kicking devices.

link (url) [BibTex]

link (url) [BibTex]


Thumb xl teaser
Visualizing dimensionality reduction of systems biology data

Lehrmann, A. M., Huber, M., Polatkan, A. C., Pritzkau, A., Nieselt, K.

Data Mining and Knowledge Discovery, 1(27):146-165, Springer, July 2013 (article)

pdf SpRay [BibTex]

pdf SpRay [BibTex]


Thumb xl jmiv2012 mut
Unscented Kalman Filtering on Riemannian Manifolds

Soren Hauberg, Francois Lauze, Kim S. Pedersen

Journal of Mathematical Imaging and Vision, 46(1):103-120, Springer Netherlands, May 2013 (article)

Publishers site PDF [BibTex]

Publishers site PDF [BibTex]


Thumb xl thumb hennigk2012 2
Quasi-Newton Methods: A New Direction

Hennig, P., Kiefel, M.

Journal of Machine Learning Research, 14(1):843-865, March 2013 (article)

Abstract
Four decades after their invention, quasi-Newton methods are still state of the art in unconstrained numerical optimization. Although not usually interpreted thus, these are learning algorithms that fit a local quadratic approximation to the objective function. We show that many, including the most popular, quasi-Newton methods can be interpreted as approximations of Bayesian linear regression under varying prior assumptions. This new notion elucidates some shortcomings of classical algorithms, and lights the way to a novel nonparametric quasi-Newton method, which is able to make more efficient use of available information at computational cost similar to its predecessors.

website+code pdf link (url) [BibTex]

website+code pdf link (url) [BibTex]


Thumb xl secretstr
A Quantitative Analysis of Current Practices in Optical Flow Estimation and the Principles Behind Them

Sun, D., Roth, S., Black, M. J.

(CS-10-03), Brown University, Department of Computer Science, January 2013 (techreport)

pdf [BibTex]

pdf [BibTex]


Thumb xl illuminationpami13
Simultaneous Cast Shadows, Illumination and Geometry Inference Using Hypergraphs

Panagopoulos, A., Wang, C., Samaras, D., Paragios, N.

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(2):437-449, 2013 (article)

pdf [BibTex]

pdf [BibTex]


Thumb xl shapeinvariance bookchapter2012
Modeling Shapes with Higher-Order Graphs: Theory and Applications

Wang, C., Zeng, Y., Samaras, D., Paragios, N.

In Shape Perception in Human and Computer Vision: An Interdisciplinary Perspective, (Editors: Zygmunt Pizlo and Sven Dickinson), Springer, 2013 (incollection)

Publishers site [BibTex]

Publishers site [BibTex]


Thumb xl training faces
Random Forests for Real Time 3D Face Analysis

Fanelli, G., Dantone, M., Gall, J., Fossati, A., van Gool, L.

International Journal of Computer Vision, 101(3):437-458, Springer, 2013 (article)

Abstract
We present a random forest-based framework for real time head pose estimation from depth images and extend it to localize a set of facial features in 3D. Our algorithm takes a voting approach, where each patch extracted from the depth image can directly cast a vote for the head pose or each of the facial features. Our system proves capable of handling large rotations, partial occlusions, and the noisy depth data acquired using commercial sensors. Moreover, the algorithm works on each frame independently and achieves real time performance without resorting to parallel computations on a GPU. We present extensive experiments on publicly available, challenging datasets and present a new annotated head pose database recorded using a Microsoft Kinect.

data and code publisher's site pdf DOI Project Page [BibTex]

data and code publisher's site pdf DOI Project Page [BibTex]


Thumb xl humans3tracking
Markerless Motion Capture of Multiple Characters Using Multi-view Image Segmentation

Liu, Y., Gall, J., Stoll, C., Dai, Q., Seidel, H., Theobalt, C.

Transactions on Pattern Analysis and Machine Intelligence, 35(11):2720-2735, 2013 (article)

Abstract
Capturing the skeleton motion and detailed time-varying surface geometry of multiple, closely interacting peoples is a very challenging task, even in a multicamera setup, due to frequent occlusions and ambiguities in feature-to-person assignments. To address this task, we propose a framework that exploits multiview image segmentation. To this end, a probabilistic shape and appearance model is employed to segment the input images and to assign each pixel uniquely to one person. Given the articulated template models of each person and the labeled pixels, a combined optimization scheme, which splits the skeleton pose optimization problem into a local one and a lower dimensional global one, is applied one by one to each individual, followed with surface estimation to capture detailed nonrigid deformations. We show on various sequences that our approach can capture the 3D motion of humans accurately even if they move rapidly, if they wear wide apparel, and if they are engaged in challenging multiperson motions, including dancing, wrestling, and hugging.

data and video pdf DOI Project Page [BibTex]

data and video pdf DOI Project Page [BibTex]


Thumb xl perception
Viewpoint and pose in body-form adaptation

Sekunova, A., Black, M., Parkinson, L., Barton, J. J. S.

Perception, 42(2):176-186, 2013 (article)

Abstract
Faces and bodies are complex structures, perception of which can play important roles in person identification and inference of emotional state. Face representations have been explored using behavioural adaptation: in particular, studies have shown that face aftereffects show relatively broad tuning for viewpoint, consistent with origin in a high-level structural descriptor far removed from the retinal image. Our goals were to determine first, if body aftereffects also showed a degree of viewpoint invariance, and second if they also showed pose invariance, given that changes in pose create even more dramatic changes in the 2-D retinal image. We used a 3-D model of the human body to generate headless body images, whose parameters could be varied to generate different body forms, viewpoints, and poses. In the first experiment, subjects adapted to varying viewpoints of either slim or heavy bodies in a neutral stance, followed by test stimuli that were all front-facing. In the second experiment, we used the same front-facing bodies in neutral stance as test stimuli, but compared adaptation from bodies in the same neutral stance to adaptation with the same bodies in different poses. We found that body aftereffects were obtained over substantial viewpoint changes, with no significant decline in aftereffect magnitude with increasing viewpoint difference between adapting and test images. Aftereffects also showed transfer across one change in pose but not across another. We conclude that body representations may have more viewpoint invariance than faces, and demonstrate at least some transfer across pose, consistent with a high-level structural description. Keywords: aftereffect, shape, face, representation

pdf from publisher abstract pdf link (url) Project Page [BibTex]

pdf from publisher abstract pdf link (url) Project Page [BibTex]


Thumb xl houghforest
Class-Specific Hough Forests for Object Detection

Gall, J., Lempitsky, V.

In Decision Forests for Computer Vision and Medical Image Analysis, pages: 143-157, 11, (Editors: Criminisi, A. and Shotton, J.), Springer, 2013 (incollection)

code Project Page [BibTex]

code Project Page [BibTex]


Thumb xl dfmdv1
Image Gradient Based Level Set Methods in 2D and 3D

Xianhua Xie, Si Yong Yeo, Majid Mirmehdi, Igor Sazonov, Perumal Nithiarasu

In Deformation Models: Tracking, Animation and Applications, pages: 101-120, 0, (Editors: Manuel González Hidalgo and Arnau Mir Torres and Javier Varona Gómez), Springer, 2013 (inbook)

Abstract
This chapter presents an image gradient based approach to perform 2D and 3D deformable model segmentation using level set. The 2D method uses an external force field that is based on magnetostatics and hypothesized magnetic interactions between the active contour and object boundaries. The major contribution of the method is that the interaction of its forces can greatly improve the active contour in capturing complex geometries and dealing with difficult initializations, weak edges and broken boundaries. This method is then generalized to 3D by reformulating its external force based on geometrical interactions between the relative geometries of the deformable model and the object boundary characterized by image gradient. The evolution of the deformable model is solved using the level set method so that topological changes are handled automatically. The relative geometrical configurations between the deformable model and the object boundaries contribute to a dynamic vector force field that changes accordingly as the deformable model evolves. The geometrically induced dynamic interaction force has been shown to greatly improve the deformable model performance in acquiring complex geometries and highly concave boundaries, and it gives the deformable model a high invariancy in initialization configurations. The voxel interactions across the whole image domain provide a global view of the object boundary representation, giving the external force a long attraction range. The bidirectionality of the external force field allows the new deformable model to deal with arbitrary cross-boundary initializations, and facilitates the handling of weak edges and broken boundaries.

[BibTex]

[BibTex]


Thumb xl 2013 ivc rkek teaser
Non-parametric hand pose estimation with object context

Romero, J., Kjellström, H., Ek, C. H., Kragic, D.

Image and Vision Computing , 31(8):555 - 564, 2013 (article)

Abstract
In the spirit of recent work on contextual recognition and estimation, we present a method for estimating the pose of human hands, employing information about the shape of the object in the hand. Despite the fact that most applications of human hand tracking involve grasping and manipulation of objects, the majority of methods in the literature assume a free hand, isolated from the surrounding environment. Occlusion of the hand from grasped objects does in fact often pose a severe challenge to the estimation of hand pose. In the presented method, object occlusion is not only compensated for, it contributes to the pose estimation in a contextual fashion; this without an explicit model of object shape. Our hand tracking method is non-parametric, performing a nearest neighbor search in a large database (.. entries) of hand poses with and without grasped objects. The system that operates in real time, is robust to self occlusions, object occlusions and segmentation errors, and provides full hand pose reconstruction from monocular video. Temporal consistency in hand pose is taken into account, without explicitly tracking the hand in the high-dim pose space. Experiments show the non-parametric method to outperform other state of the art regression methods, while operating at a significantly lower computational cost than comparable model-based hand tracking methods.

Publisher site pdf link (url) [BibTex]

Publisher site pdf link (url) [BibTex]

2011


Thumb xl trimproc small
High-quality reflection separation using polarized images

Kong, N., Tai, Y., Shin, S. Y.

IEEE Transactions on Image Processing, 20(12):3393-3405, IEEE Signal Processing Society, December 2011 (article)

Abstract
In this paper, we deal with a problem of separating the effect of reflection from images captured behind glass. The input consists of multiple polarized images captured from the same view point but with different polarizer angles. The output is the high quality separation of the reflection layer and the background layer from the images. We formulate this problem as a constrained optimization problem and propose a framework that allows us to fully exploit the mutually exclusive image information in our input data. We test our approach on various images and demonstrate that our approach can generate good reflection separation results.

Publisher site [BibTex]

2011

Publisher site [BibTex]


no image
A human inspired gaze estimation system

Wulff, J., Sinha, P.

Journal of Vision, 11(11):507-507, ARVO, September 2011 (article)

Abstract
Estimating another person's gaze is a crucial skill in human social interactions. The social component is most apparent in dyadic gaze situations, in which the looker seems to look into the eyes of the observer, thereby signaling interest or a turn to speak. In a triadic situation, on the other hand, the looker's gaze is averted from the observer and directed towards another, specific target. This is mostly interpreted as a cue for joint attention, creating awareness of a predator or another point of interest. In keeping with the task's social significance, humans are very proficient at gaze estimation. Our accuracy ranges from less than one degree for dyadic settings to approximately 2.5 degrees for triadic ones. Our goal in this work is to draw inspiration from human gaze estimation mechanisms in order to create an artificial system that can approach the former's accuracy levels. Since human performance is severely impaired by both image-based degradations (Ando, 2004) and a change of facial configurations (Jenkins & Langton, 2003), the underlying principles are believed to be based both on simple image cues such as contrast/brightness distribution and on more complex geometric processing to reconstruct the actual shape of the head. By incorporating both kinds of cues in our system's design, we are able to surpass the accuracy of existing eye-tracking systems, which rely exclusively on either image-based or geometry-based cues (Yamazoe et al., 2008). A side-benefit of this combined approach is that it allows for gaze estimation despite moderate view-point changes. This is important for settings where subjects, say young children or certain kinds of patients, might not be fully cooperative to allow a careful calibration. Our model and implementation of gaze estimation opens up new experimental questions about human mechanisms while also providing a useful tool for general calibration-free, non-intrusive remote eye-tracking.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Detecting synchrony in degraded audio-visual streams

Dhandhania, K., Wulff, J., Sinha, P.

Journal of Vision, 11(11):800-800, ARVO, September 2011 (article)

Abstract
Even 8–10 week old infants, when presented with two dynamic faces and a speech stream, look significantly longer at the ‘correct’ talking person (Patterson & Werker, 2003). This is true even though their reduced visual acuity prevents them from utilizing high spatial frequencies. Computational analyses in the field of audio/video synchrony and automatic speaker detection (e.g. Hershey & Movellan, 2000), in contrast, usually depend on high-resolution images. Therefore, the correlation mechanisms found in these computational studies are not directly applicable to the processes through which we learn to integrate the modalities of speech and vision. In this work, we investigated the correlation between speech signals and degraded video signals. We found a high correlation persisting even with high image degradation, resembling the low visual acuity of young infants. Additionally (in a fashion similar to Graf et al., 2002) we explored which parts of the face correlate with the audio in the degraded video sequences. Perfect synchrony and small offsets in the audio were used while finding the correlation, thereby detecting visual events preceding and following audio events. In order to achieve a sufficiently high temporal resolution, high-speed video sequences (500 frames per second) of talking people were used. This is a temporal resolution unachieved in previous studies and has allowed us to capture very subtle and short visual events. We believe that the results of this study might be interesting not only to vision researchers, but, by revealing subtle effects on a very fine timescale, also to people working in computer graphics and the generation and animation of artificial faces.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
ISocRob-MSL 2011 Team Description Paper for Middle Sized League

Messias, J., Ahmad, A., Reis, J., Sousa, J., Lima, P.

15th Annual RoboCup International Symposium 2011, July 2011 (techreport)

Abstract
This paper describes the status of the ISocRob MSL robotic soccer team as required by the RoboCup 2011 qualification procedures. The most relevant technical and scientifical developments carried out by the team, since its last participation in the RoboCup MSL competitions, are here detailed. These include cooperative localization, cooperative object tracking, planning under uncertainty, obstacle detection and improvements to self-localization.

link (url) [BibTex]

link (url) [BibTex]


Thumb xl trajectory pami
Trajectory Space: A Dual Representation for Nonrigid Structure from Motion

Akhter, I., Sheikh, Y., Khan, S., Kanade, T.

Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(7):1442-1456, IEEE, July 2011 (article)

Abstract
Existing approaches to nonrigid structure from motion assume that the instantaneous 3D shape of a deforming object is a linear combination of basis shapes. These basis are object dependent and therefore have to be estimated anew for each video sequence. In contrast, we propose a dual approach to describe the evolving 3D structure in trajectory space by a linear combination of basis trajectories. We describe the dual relationship between the two approaches, showing that they both have equal power for representing 3D structure. We further show that the temporal smoothness in 3D trajectories alone can be used for recovering nonrigid structure from a moving camera. The principal advantage of expressing deforming 3D structure in trajectory space is that we can define an object independent basis. This results in a significant reduction in unknowns, and corresponding stability in estimation. We propose the use of the Discrete Cosine Transform (DCT) as the object independent basis and empirically demonstrate that it approaches Principal Component Analysis (PCA) for natural motions. We report the performance of the proposed method, quantitatively using motion capture data, and qualitatively on several video sequences exhibiting nonrigid motions including piecewise rigid motion, partially nonrigid motion (such as a facial expressions), and highly nonrigid motion (such as a person walking or dancing).

pdf project page [BibTex]

pdf project page [BibTex]


Thumb xl sigalijcv11
Loose-limbed People: Estimating 3D Human Pose and Motion Using Non-parametric Belief Propagation

Sigal, L., Isard, M., Haussecker, H., Black, M. J.

International Journal of Computer Vision, 98(1):15-48, Springer Netherlands, May 2011 (article)

Abstract
We formulate the problem of 3D human pose estimation and tracking as one of inference in a graphical model. Unlike traditional kinematic tree representations, our model of the body is a collection of loosely-connected body-parts. In particular, we model the body using an undirected graphical model in which nodes correspond to parts and edges to kinematic, penetration, and temporal constraints imposed by the joints and the world. These constraints are encoded using pair-wise statistical distributions, that are learned from motion-capture training data. Human pose and motion estimation is formulated as inference in this graphical model and is solved using Particle Message Passing (PaMPas). PaMPas is a form of non-parametric belief propagation that uses a variation of particle filtering that can be applied over a general graphical model with loops. The loose-limbed model and decentralized graph structure allow us to incorporate information from "bottom-up" visual cues, such as limb and head detectors, into the inference process. These detectors enable automatic initialization and aid recovery from transient tracking failures. We illustrate the method by automatically tracking people in multi-view imagery using a set of calibrated cameras and present quantitative evaluation using the HumanEva dataset.

pdf publisher's site link (url) Project Page Project Page [BibTex]

pdf publisher's site link (url) Project Page Project Page [BibTex]


Thumb xl pointclickimagewide
Point-and-Click Cursor Control With an Intracortical Neural Interface System by Humans With Tetraplegia

Kim, S., Simeral, J. D., Hochberg, L. R., Donoghue, J. P., Friehs, G. M., Black, M. J.

IEEE Transactions on Neural Systems and Rehabilitation Engineering, 19(2):193-203, April 2011 (article)

Abstract
We present a point-and-click intracortical neural interface system (NIS) that enables humans with tetraplegia to volitionally move a 2D computer cursor in any desired direction on a computer screen, hold it still and click on the area of interest. This direct brain-computer interface extracts both discrete (click) and continuous (cursor velocity) signals from a single small population of neurons in human motor cortex. A key component of this system is a multi-state probabilistic decoding algorithm that simultaneously decodes neural spiking activity and outputs either a click signal or the velocity of the cursor. The algorithm combines a linear classifier, which determines whether the user is intending to click or move the cursor, with a Kalman filter that translates the neural population activity into cursor velocity. We present a paradigm for training the multi-state decoding algorithm using neural activity observed during imagined actions. Two human participants with tetraplegia (paralysis of the four limbs) performed a closed-loop radial target acquisition task using the point-and-click NIS over multiple sessions. We quantified point-and-click performance using various human-computer interaction measurements for pointing devices. We found that participants were able to control the cursor motion accurately and click on specified targets with a small error rate (< 3% in one participant). This study suggests that signals from a small ensemble of motor cortical neurons (~40) can be used for natural point-and-click 2D cursor control of a personal computer.

pdf publishers's site pub med link (url) Project Page [BibTex]

pdf publishers's site pub med link (url) Project Page [BibTex]


Thumb xl middleburyimagesmall
A Database and Evaluation Methodology for Optical Flow

Baker, S., Scharstein, D., Lewis, J. P., Roth, S., Black, M. J., Szeliski, R.

International Journal of Computer Vision, 92(1):1-31, March 2011 (article)

Abstract
The quantitative evaluation of optical flow algorithms by Barron et al. (1994) led to significant advances in performance. The challenges for optical flow algorithms today go beyond the datasets and evaluation methods proposed in that paper. Instead, they center on problems associated with complex natural scenes, including nonrigid motion, real sensor noise, and motion discontinuities. We propose a new set of benchmarks and evaluation methods for the next generation of optical flow algorithms. To that end, we contribute four types of data to test different aspects of optical flow algorithms: (1) sequences with nonrigid motion where the ground-truth flow is determined by tracking hidden fluorescent texture, (2) realistic synthetic sequences, (3) high frame-rate video used to study interpolation error, and (4) modified stereo sequences of static scenes. In addition to the average angular error used by Barron et al., we compute the absolute flow endpoint error, measures for frame interpolation error, improved statistics, and results at motion discontinuities and in textureless regions. In October 2007, we published the performance of several well-known methods on a preliminary version of our data to establish the current state of the art. We also made the data freely available on the web at http://vision.middlebury.edu/flow/ . Subsequently a number of researchers have uploaded their results to our website and published papers using the data. A significant improvement in performance has already been achieved. In this paper we analyze the results obtained to date and draw a large number of conclusions from them.

pdf pdf from publisher Middlebury Flow Evaluation Website [BibTex]

pdf pdf from publisher Middlebury Flow Evaluation Website [BibTex]


Thumb xl 1000dayimagesmall
Neural control of cursor trajectory and click by a human with tetraplegia 1000 days after implant of an intracortical microelectrode array

(J. Neural Engineering Highlights of 2011 Collection. JNE top 10 cited papers of 2010-2011.)

Simeral, J. D., Kim, S., Black, M. J., Donoghue, J. P., Hochberg, L. R.

J. of Neural Engineering, 8(2):025027, 2011 (article)

Abstract
The ongoing pilot clinical trial of the BrainGate neural interface system aims in part to assess the feasibility of using neural activity obtained from a small-scale, chronically implanted, intracortical microelectrode array to provide control signals for a neural prosthesis system. Critical questions include how long implanted microelectrodes will record useful neural signals, how reliably those signals can be acquired and decoded, and how effectively they can be used to control various assistive technologies such as computers and robotic assistive devices, or to enable functional electrical stimulation of paralyzed muscles. Here we examined these questions by assessing neural cursor control and BrainGate system characteristics on five consecutive days 1000 days after implant of a 4 × 4 mm array of 100 microelectrodes in the motor cortex of a human with longstanding tetraplegia subsequent to a brainstem stroke. On each of five prospectively-selected days we performed time-amplitude sorting of neuronal spiking activity, trained a population-based Kalman velocity decoding filter combined with a linear discriminant click state classifier, and then assessed closed-loop point-and-click cursor control. The participant performed both an eight-target center-out task and a random target Fitts metric task which was adapted from a human-computer interaction ISO standard used to quantify performance of computer input devices. The neural interface system was further characterized by daily measurement of electrode impedances, unit waveforms and local field potentials. Across the five days, spiking signals were obtained from 41 of 96 electrodes and were successfully decoded to provide neural cursor point-and-click control with a mean task performance of 91.3% ± 0.1% (mean ± s.d.) correct target acquisition. Results across five consecutive days demonstrate that a neural interface system based on an intracortical microelectrode array can provide repeatable, accurate point-and-click control of a computer interface to an individual with tetraplegia 1000 days after implantation of this sensor.

pdf pdf from publisher link (url) Project Page [BibTex]


Thumb xl andriluka2011
Benchmark datasets for pose estimation and tracking

Andriluka, M., Sigal, L., Black, M. J.

In Visual Analysis of Humans: Looking at People, pages: 253-274, (Editors: Moesland and Hilton and Kr"uger and Sigal), Springer-Verlag, London, 2011 (incollection)

publisher's site Project Page [BibTex]

publisher's site Project Page [BibTex]


Thumb xl foe2011
Fields of experts

Roth, S., Black, M. J.

In Markov Random Fields for Vision and Image Processing, pages: 297-310, (Editors: Blake, A. and Kohli, P. and Rother, C.), MIT Press, 2011 (incollection)

Abstract
Fields of Experts are high-order Markov random field (MRF) models with potential functions that extend over large pixel neighborhoods. The clique potentials are modeled as a Product of Experts using nonlinear functions of many linear filter responses. In contrast to previous MRF approaches, all parameters, including the linear filters themselves, are learned from training data. A Field of Experts (FoE) provides a generic, expressive image prior that can capture the statistics of natural scenes, and can be used for a variety of machine vision tasks. The capabilities of FoEs are demonstrated with two example applications, image denoising and image inpainting, which are implemented using a simple, approximate inference scheme. While the FoE model is trained on a generic image database and is not tuned toward a specific application, the results compete with specialized techniques.

publisher site [BibTex]

publisher site [BibTex]


Thumb xl screen shot 2012 03 13 at 2.41.46 pm
Dorsal Stream: From Algorithm to Neuroscience

Jhuang, H.

PhD Thesis, MIT, 2011 (techreport)

pdf [BibTex]


Thumb xl ijnmbe1
Modelling pipeline for subject-specific arterial blood flow—A review

Igor Sazonov, Si Yong Yeo, Rhodri Bevan, Xianghua Xie, Raoul van Loon, Perumal Nithiarasu

International Journal for Numerical Methods in Biomedical Engineering, 27(12):1868–1910, 2011 (article)

Abstract
In this paper, a robust and semi-automatic modelling pipeline for blood flow through subject-specific arterial geometries is presented. The framework developed consists of image segmentation, domain discretization (meshing) and fluid dynamics. All the three subtopics of the pipeline are explained using an example of flow through a severely stenosed human carotid artery. In the Introduction, the state-of-the-art of both image segmentation and meshing is presented in some detail, and wherever possible the advantages and disadvantages of the existing methods are analysed. Followed by this, the deformable model used for image segmentation is presented. This model is based upon a geometrical potential force (GPF), which is a function of the image. Both the GPF calculation and level set determination are explained. Following the image segmentation method, a semi-automatic meshing method used in the present study is explained in full detail. All the relevant techniques required to generate a valid domain discretization are presented. These techniques include generating a valid surface mesh, skeletonization, mesh cropping, boundary layer mesh construction and various mesh cosmetic methods that are essential for generating a high-quality domain discretization. After presenting the mesh generation procedure, how to generate flow boundary conditions for both the inlets and outlets of a geometry is explained in detail. This is followed by a brief note on the flow solver, before studying the blood flow through the carotid artery with a severe stenosis.

[BibTex]

[BibTex]


Thumb xl tnip1
Geometrically Induced Force Interaction for Three-Dimensional Deformable Models

Si Yong Yeo, Xianghua Xie, Igor Sazonov, Perumal Nithiarasu

IEEE Transactions on Image Processing, 20(5):1373 - 1387, 2011 (article)

Abstract
In this paper, we propose a novel 3-D deformable model that is based upon a geometrically induced external force field which can be conveniently generalized to arbitrary dimensions. This external force field is based upon hypothesized interactions between the relative geometries of the deformable model and the object boundary characterized by image gradient. The evolution of the deformable model is solved using the level set method so that topological changes are handled automatically. The relative geometrical configurations between the deformable model and the object boundaries contribute to a dynamic vector force field that changes accordingly as the deformable model evolves. The geometrically induced dynamic interaction force has been shown to greatly improve the deformable model performance in acquiring complex geometries and highly concave boundaries, and it gives the deformable model a high invariancy in initialization configurations. The voxel interactions across the whole image domain provide a global view of the object boundary representation, giving the external force a long attraction range. The bidirectionality of the external force field allows the new deformable model to deal with arbitrary cross-boundary initializations, and facilitates the handling of weak edges and broken boundaries. In addition, we show that by enhancing the geometrical interaction field with a nonlocal edge-preserving algorithm, the new deformable model can effectively overcome image noise. We provide a comparative study on the segmentation of various geometries with different topologies from both synthetic and real images, and show that the proposed method achieves significant improvements against existing image gradient techniques.

[BibTex]

[BibTex]


Thumb xl srf2011 2
Steerable random fields for image restoration and inpainting

Roth, S., Black, M. J.

In Markov Random Fields for Vision and Image Processing, pages: 377-387, (Editors: Blake, A. and Kohli, P. and Rother, C.), MIT Press, 2011 (incollection)

Abstract
This chapter introduces the concept of a Steerable Random Field (SRF). In contrast to traditional Markov random field (MRF) models in low-level vision, the random field potentials of a SRF are defined in terms of filter responses that are steered to the local image structure. This steering uses the structure tensor to obtain derivative responses that are either aligned with, or orthogonal to, the predominant local image structure. Analysis of the statistics of these steered filter responses in natural images leads to the model proposed here. Clique potentials are defined over steered filter responses using a Gaussian scale mixture model and are learned from training data. The SRF model connects random fields with anisotropic regularization and provides a statistical motivation for the latter. Steering the random field to the local image structure improves image denoising and inpainting performance compared with traditional pairwise MRFs.

publisher site [BibTex]

publisher site [BibTex]


Thumb xl teaser bchap
Model-Based Pose Estimation

Pons-Moll, G., Rosenhahn, B.

In Visual Analysis of Humans: Looking at People, pages: 139-170, 9, (Editors: T. Moeslund, A. Hilton, V. Krueger, L. Sigal), Springer, 2011 (inbook)

book page pdf [BibTex]

book page pdf [BibTex]