Header logo is ps


2020


3D Morphable Face Models - Past, Present and Future
3D Morphable Face Models - Past, Present and Future

Egger, B., Smith, W. A. P., Tewari, A., Wuhrer, S., Zollhoefer, M., Beeler, T., Bernard, F., Bolkart, T., Kortylewski, A., Romdhani, S., Theobalt, C., Blanz, V., Vetter, T.

ACM Transactions on Graphics, September 2020 (article)

Abstract
In this paper, we provide a detailed survey of 3D Morphable Face Models over the 20 years since they were first proposed. The challenges in building and applying these models, namely capture, modeling, image formation, and image analysis, are still active research topics, and we review the state-of-the-art in each of these areas. We also look ahead, identifying unsolved challenges, proposing directions for future research and highlighting the broad range of current and future applications.

project page pdf preprint [BibTex]

2020

project page pdf preprint [BibTex]


General Movement Assessment from videos of computed {3D} infant body models is equally effective compared to conventional {RGB} Video rating
General Movement Assessment from videos of computed 3D infant body models is equally effective compared to conventional RGB Video rating

Schroeder, S., Hesse, N., Weinberger, R., Tacke, U., Gerstl, L., Hilgendorff, A., Heinen, F., Arens, M., Bodensteiner, C., Dijkstra, L. J., Pujades, S., Black, M., Hadders-Algra, M.

Early Human Development, 144, May 2020 (article)

Abstract
Background: General Movement Assessment (GMA) is a powerful tool to predict Cerebral Palsy (CP). Yet, GMA requires substantial training hampering its implementation in clinical routine. This inspired a world-wide quest for automated GMA. Aim: To test whether a low-cost, marker-less system for three-dimensional motion capture from RGB depth sequences using a whole body infant model may serve as the basis for automated GMA. Study design: Clinical case study at an academic neurodevelopmental outpatient clinic. Subjects: Twenty-nine high-risk infants were recruited and assessed at their clinical follow-up at 2-4 month corrected age (CA). Their neurodevelopmental outcome was assessed regularly up to 12-31 months CA. Outcome measures: GMA according to Hadders-Algra by a masked GMA-expert of conventional and computed 3D body model (“SMIL motion”) videos of the same GMs. Agreement between both GMAs was assessed, and sensitivity and specificity of both methods to predict CP at ≥12 months CA. Results: The agreement of the two GMA ratings was substantial, with κ=0.66 for the classification of definitely abnormal (DA) GMs and an ICC of 0.887 (95% CI 0.762;0.947) for a more detailed GM-scoring. Five children were diagnosed with CP (four bilateral, one unilateral CP). The GMs of the child with unilateral CP were twice rated as mildly abnormal. DA-ratings of both videos predicted bilateral CP well: sensitivity 75% and 100%, specificity 88% and 92% for conventional and SMIL motion videos, respectively. Conclusions: Our computed infant 3D full body model is an attractive starting point for automated GMA in infants at risk of CP.

DOI [BibTex]

DOI [BibTex]


Learning Multi-Human Optical Flow
Learning Multi-Human Optical Flow

Ranjan, A., Hoffmann, D. T., Tzionas, D., Tang, S., Romero, J., Black, M. J.

International Journal of Computer Vision (IJCV), (128):873–-890, April 2020 (article)

Abstract
The optical flow of humans is well known to be useful for the analysis of human action. Recent optical flow methods focus on training deep networks to approach the problem. However, the training data used by them does not cover the domain of human motion. Therefore, we develop a dataset of multi-human optical flow and train optical flow networks on this dataset. We use a 3D model of the human body and motion capture data to synthesize realistic flow fields in both single-and multi-person images. We then train optical flow networks to estimate human flow fields from pairs of images. We demonstrate that our trained networks are more accurate than a wide range of top methods on held-out test data and that they can generalize well to real image sequences. The code, trained models and the dataset are available for research.

Paper Publisher Version poster link (url) DOI [BibTex]

Paper Publisher Version poster link (url) DOI [BibTex]


no image
Real Time Trajectory Prediction Using Deep Conditional Generative Models

Gomez-Gonzalez, S., Prokudin, S., Schölkopf, B., Peters, J.

IEEE Robotics and Automation Letters, 5(2):970-976, IEEE, January 2020 (article)

arXiv DOI [BibTex]

2017


Learning a model of facial shape and expression from {4D} scans
Learning a model of facial shape and expression from 4D scans

Li, T., Bolkart, T., Black, M. J., Li, H., Romero, J.

ACM Transactions on Graphics, 36(6):194:1-194:17, November 2017, Two first authors contributed equally (article)

Abstract
The field of 3D face modeling has a large gap between high-end and low-end methods. At the high end, the best facial animation is indistinguishable from real humans, but this comes at the cost of extensive manual labor. At the low end, face capture from consumer depth sensors relies on 3D face models that are not expressive enough to capture the variability in natural facial shape and expression. We seek a middle ground by learning a facial model from thousands of accurately aligned 3D scans. Our FLAME model (Faces Learned with an Articulated Model and Expressions) is designed to work with existing graphics software and be easy to fit to data. FLAME uses a linear shape space trained from 3800 scans of human heads. FLAME combines this linear shape space with an articulated jaw, neck, and eyeballs, pose-dependent corrective blendshapes, and additional global expression from 4D face sequences in the D3DFACS dataset along with additional 4D sequences.We accurately register a template mesh to the scan sequences and make the D3DFACS registrations available for research purposes. In total the model is trained from over 33, 000 scans. FLAME is low-dimensional but more expressive than the FaceWarehouse model and the Basel Face Model. We compare FLAME to these models by fitting them to static 3D scans and 4D sequences using the same optimization method. FLAME is significantly more accurate and is available for research purposes (http://flame.is.tue.mpg.de).

data/model video code chumpy code tensorflow paper supplemental Project Page [BibTex]

2017

data/model video code chumpy code tensorflow paper supplemental Project Page [BibTex]


Investigating Body Image Disturbance in Anorexia Nervosa Using Novel Biometric Figure Rating Scales: A Pilot Study
Investigating Body Image Disturbance in Anorexia Nervosa Using Novel Biometric Figure Rating Scales: A Pilot Study

Mölbert, S. C., Thaler, A., Streuber, S., Black, M. J., Karnath, H., Zipfel, S., Mohler, B., Giel, K. E.

European Eating Disorders Review, 25(6):607-612, November 2017 (article)

Abstract
This study uses novel biometric figure rating scales (FRS) spanning body mass index (BMI) 13.8 to 32.2 kg/m2 and BMI 18 to 42 kg/m2. The aims of the study were (i) to compare FRS body weight dissatisfaction and perceptual distortion of women with anorexia nervosa (AN) to a community sample; (ii) how FRS parameters are associated with questionnaire body dissatisfaction, eating disorder symptoms and appearance comparison habits; and (iii) whether the weight spectrum of the FRS matters. Women with AN (n = 24) and a community sample of women (n = 104) selected their current and ideal body on the FRS and completed additional questionnaires. Women with AN accurately picked the body that aligned best with their actual weight in both FRS. Controls underestimated their BMI in the FRS 14–32 and were accurate in the FRS 18–42. In both FRS, women with AN desired a body close to their actual BMI and controls desired a thinner body. Our observations suggest that body image disturbance in AN is unlikely to be characterized by a visual perceptual disturbance, but rather by an idealization of underweight in conjunction with high body dissatisfaction. The weight spectrum of FRS can influence the accuracy of BMI estimation.

publisher DOI Project Page [BibTex]

publisher DOI Project Page [BibTex]


Embodied Hands: Modeling and Capturing Hands and Bodies Together
Embodied Hands: Modeling and Capturing Hands and Bodies Together

Romero, J., Tzionas, D., Black, M. J.

ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6):245:1-245:17, 245:1–245:17, ACM, November 2017 (article)

Abstract
Humans move their hands and bodies together to communicate and solve tasks. Capturing and replicating such coordinated activity is critical for virtual characters that behave realistically. Surprisingly, most methods treat the 3D modeling and tracking of bodies and hands separately. Here we formulate a model of hands and bodies interacting together and fit it to full-body 4D sequences. When scanning or capturing the full body in 3D, hands are small and often partially occluded, making their shape and pose hard to recover. To cope with low-resolution, occlusion, and noise, we develop a new model called MANO (hand Model with Articulated and Non-rigid defOrmations). MANO is learned from around 1000 high-resolution 3D scans of hands of 31 subjects in a wide variety of hand poses. The model is realistic, low-dimensional, captures non-rigid shape changes with pose, is compatible with standard graphics packages, and can fit any human hand. MANO provides a compact mapping from hand poses to pose blend shape corrections and a linear manifold of pose synergies. We attach MANO to a standard parameterized 3D body shape model (SMPL), resulting in a fully articulated body and hand model (SMPL+H). We illustrate SMPL+H by fitting complex, natural, activities of subjects captured with a 4D scanner. The fitting is fully automatic and results in full body models that move naturally with detailed hand motions and a realism not seen before in full body performance capture. The models and data are freely available for research purposes at http://mano.is.tue.mpg.de.

website youtube paper suppl video link (url) DOI Project Page [BibTex]

website youtube paper suppl video link (url) DOI Project Page [BibTex]


An Online Scalable Approach to Unified Multirobot Cooperative Localization and Object Tracking
An Online Scalable Approach to Unified Multirobot Cooperative Localization and Object Tracking

Ahmad, A., Lawless, G., Lima, P.

IEEE Transactions on Robotics (T-RO), 33, pages: 1184 - 1199, October 2017 (article)

Abstract
In this article we present a unified approach for multi-robot cooperative simultaneous localization and object tracking based on particle filters. Our approach is scalable with respect to the number of robots in the team. We introduce a method that reduces, from an exponential to a linear growth, the space and computation time requirements with respect to the number of robots in order to maintain a given level of accuracy in the full state estimation. Our method requires no increase in the number of particles with respect to the number of robots. However, in our method each particle represents a full state hypothesis, leading to the linear dependency on the number of robots of both space and time complexity. The derivation of the algorithm implementing our approach from a standard particle filter algorithm and its complexity analysis are presented. Through an extensive set of simulation experiments on a large number of randomized datasets, we demonstrate the correctness and efficacy of our approach. Through real robot experiments on a standardized open dataset of a team of four soccer playing robots tracking a ball, we evaluate our method's estimation accuracy with respect to the ground truth values. Through comparisons with other methods based on i) nonlinear least squares minimization and ii) joint extended Kalman filter, we further highlight our method's advantages. Finally, we also present a robustness test for our approach by evaluating it under scenarios of communication and vision failure in teammate robots.

Published Version link (url) DOI [BibTex]


Parameterized Model of {2D} Articulated Human Shape
Parameterized Model of 2D Articulated Human Shape

Black, M. J., Freifeld, O., Weiss, A., Loper, M., Guan, P.

September 2017, U.S.~Patent 9,761,060 (misc)

Abstract
Disclosed are computer-readable devices, systems and methods for generating a model of a clothed body. The method includes generating a model of an unclothed human body, the model capturing a shape or a pose of the unclothed human body, determining two-dimensional contours associated with the model, and computing deformations by aligning a contour of a clothed human body with a contour of the unclothed human body. Based on the two-dimensional contours and the deformations, the method includes generating a first two-dimensional model of the unclothed human body, the first two-dimensional model factoring the deformations of the unclothed human body into one or more of a shape variation component, a viewpoint change, and a pose variation and learning an eigen-clothing model using principal component analysis applied to the deformations, wherein the eigen-clothing model classifies different types of clothing, to yield a second two-dimensional model of a clothed human body.

Google Patents [BibTex]


Crowdshaping Realistic {3D} Avatars with Words
Crowdshaping Realistic 3D Avatars with Words

Streuber, S., Ramirez, M. Q., Black, M., Zuffi, S., O’Toole, A., Hill, M. Q., Hahn, C. A.

August 2017, Application PCT/EP2017/051954 (misc)

Abstract
A method for generating a body shape, comprising the steps: - receiving one or more linguistic descriptors related to the body shape; - retrieving an association between the one or more linguistic descriptors and a body shape; and - generating the body shape, based on the association.

Google Patents [BibTex]

Google Patents [BibTex]


System and method for simulating realistic clothing
System and method for simulating realistic clothing

Black, M. J., Guan, P.

June 2017, U.S.~Patent 9,679,409 B2 (misc)

Abstract
Systems, methods, and computer-readable storage media for simulating realistic clothing. The system generates a clothing deformation model for a clothing type, wherein the clothing deformation model factors a change of clothing shape due to rigid limb rotation, pose-independent body shape, and pose-dependent deformations. Next, the system generates a custom-shaped garment for a given body by mapping, via the clothing deformation model, body shape parameters to clothing shape parameters. The system then automatically dresses the given body with the custom- shaped garment.

Google Patents pdf [BibTex]


Early Stopping Without a Validation Set
Early Stopping Without a Validation Set

Mahsereci, M., Balles, L., Lassner, C., Hennig, P.

arXiv preprint arXiv:1703.09580, 2017 (article)

Abstract
Early stopping is a widely used technique to prevent poor generalization performance when training an over-expressive model by means of gradient-based optimization. To find a good point to halt the optimizer, a common practice is to split the dataset into a training and a smaller validation set to obtain an ongoing estimate of the generalization performance. In this paper we propose a novel early stopping criterion which is based on fast-to-compute, local statistics of the computed gradients and entirely removes the need for a held-out validation set. Our experiments show that this is a viable approach in the setting of least-squares and logistic regression as well as neural networks.

link (url) Project Page Project Page [BibTex]


Data-Driven Physics for Human Soft Tissue Animation
Data-Driven Physics for Human Soft Tissue Animation

Kim, M., Pons-Moll, G., Pujades, S., Bang, S., Kim, J., Black, M. J., Lee, S.

ACM Transactions on Graphics, (Proc. SIGGRAPH), 36(4):54:1-54:12, 2017 (article)

Abstract
Data driven models of human poses and soft-tissue deformations can produce very realistic results, but they only model the visible surface of the human body and cannot create skin deformation due to interactions with the environment. Physical simulations can generalize to external forces, but their parameters are difficult to control. In this paper, we present a layered volumetric human body model learned from data. Our model is composed of a data-driven inner layer and a physics-based external layer. The inner layer is driven with a volumetric statistical body model (VSMPL). The soft tissue layer consists of a tetrahedral mesh that is driven using the finite element method (FEM). Model parameters, namely the segmentation of the body into layers and the soft tissue elasticity, are learned directly from 4D registrations of humans exhibiting soft tissue deformations. The learned two layer model is a realistic full-body avatar that generalizes to novel motions and external forces. Experiments show that the resulting avatars produce realistic results on held out sequences and react to external forces. Moreover, the model supports the retargeting of physical properties from one avatar when they share the same topology.

video paper link (url) Project Page [BibTex]

video paper link (url) Project Page [BibTex]


Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs
Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs

(Best Paper, Eurographics 2017)

Marcard, T. V., Rosenhahn, B., Black, M., Pons-Moll, G.

Computer Graphics Forum 36(2), Proceedings of the 38th Annual Conference of the European Association for Computer Graphics (Eurographics), pages: 349-360 , 2017 (article)

Abstract
We address the problem of making human motion capture in the wild more practical by using a small set of inertial sensors attached to the body. Since the problem is heavily under-constrained, previous methods either use a large number of sensors, which is intrusive, or they require additional video input. We take a different approach and constrain the problem by: (i) making use of a realistic statistical body model that includes anthropometric constraints and (ii) using a joint optimization framework to fit the model to orientation and acceleration measurements over multiple frames. The resulting tracker Sparse Inertial Poser (SIP) enables motion capture using only 6 sensors (attached to the wrists, lower legs, back and head) and works for arbitrary human motions. Experiments on the recently released TNT15 dataset show that, using the same number of sensors, SIP achieves higher accuracy than the dataset baseline without using any video data. We further demonstrate the effectiveness of SIP on newly recorded challenging motions in outdoor scenarios such as climbing or jumping over a wall

video pdf Project Page [BibTex]

video pdf Project Page [BibTex]


Efficient 2D and 3D Facade Segmentation using Auto-Context
Efficient 2D and 3D Facade Segmentation using Auto-Context

Gadde, R., Jampani, V., Marlet, R., Gehler, P.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017 (article)

Abstract
This paper introduces a fast and efficient segmentation technique for 2D images and 3D point clouds of building facades. Facades of buildings are highly structured and consequently most methods that have been proposed for this problem aim to make use of this strong prior information. Contrary to most prior work, we are describing a system that is almost domain independent and consists of standard segmentation methods. We train a sequence of boosted decision trees using auto-context features. This is learned using stacked generalization. We find that this technique performs better, or comparable with all previous published methods and present empirical results on all available 2D and 3D facade benchmark datasets. The proposed method is simple to implement, easy to extend, and very efficient at test-time inference.

arXiv Project Page [BibTex]

arXiv Project Page [BibTex]


{ClothCap}: Seamless {4D} Clothing Capture and Retargeting
ClothCap: Seamless 4D Clothing Capture and Retargeting

Pons-Moll, G., Pujades, S., Hu, S., Black, M.

ACM Transactions on Graphics, (Proc. SIGGRAPH), 36(4):73:1-73:15, ACM, New York, NY, USA, 2017, Two first authors contributed equally (article)

Abstract
Designing and simulating realistic clothing is challenging and, while several methods have addressed the capture of clothing from 3D scans, previous methods have been limited to single garments and simple motions, lack detail, or require specialized texture patterns. Here we address the problem of capturing regular clothing on fully dressed people in motion. People typically wear multiple pieces of clothing at a time. To estimate the shape of such clothing, track it over time, and render it believably, each garment must be segmented from the others and the body. Our ClothCap approach uses a new multi-part 3D model of clothed bodies, automatically segments each piece of clothing, estimates the naked body shape and pose under the clothing, and tracks the 3D deformations of the clothing over time. We estimate the garments and their motion from 4D scans; that is, high-resolution 3D scans of the subject in motion at 60 fps. The model allows us to capture a clothed person in motion, extract their clothing, and retarget the clothing to new body shapes. ClothCap provides a step towards virtual try-on with a technology for capturing, modeling, and analyzing clothing in motion.

video project_page paper link (url) DOI Project Page Project Page [BibTex]

video project_page paper link (url) DOI Project Page Project Page [BibTex]

2016


Skinned multi-person linear model
Skinned multi-person linear model

Black, M.J., Loper, M., Mahmood, N., Pons-Moll, G., Romero, J.

December 2016, Application PCT/EP2016/064610 (misc)

Abstract
The invention comprises a learned model of human body shape and pose dependent shape variation that is more accurate than previous models and is compatible with existing graphics pipelines. Our Skinned Multi-Person Linear model (SMPL) is a skinned vertex based model that accurately represents a wide variety of body shapes in natural human poses. The parameters of the model are learned from data including the rest pose template, blend weights, pose-dependent blend shapes, identity- dependent blend shapes, and a regressor from vertices to joint locations. Unlike previous models, the pose-dependent blend shapes are a linear function of the elements of the pose rotation matrices. This simple formulation enables training the entire model from a relatively large number of aligned 3D meshes of different people in different poses. The invention quantitatively evaluates variants of SMPL using linear or dual- quaternion blend skinning and show that both are more accurate than a Blend SCAPE model trained on the same data. In a further embodiment, the invention realistically models dynamic soft-tissue deformations. Because it is based on blend skinning, SMPL is compatible with existing rendering engines and we make it available for research purposes.

Google Patents [BibTex]

2016

Google Patents [BibTex]


Creating body shapes from verbal descriptions by linking similarity spaces
Creating body shapes from verbal descriptions by linking similarity spaces

Hill, M. Q., Streuber, S., Hahn, C. A., Black, M. J., O’Toole, A. J.

Psychological Science, 27(11):1486-1497, November 2016, (article)

Abstract
Brief verbal descriptions of bodies (e.g. curvy, long-legged) can elicit vivid mental images. The ease with which we create these mental images belies the complexity of three-dimensional body shapes. We explored the relationship between body shapes and body descriptions and show that a small number of words can be used to generate categorically accurate representations of three-dimensional bodies. The dimensions of body shape variation that emerged in a language-based similarity space were related to major dimensions of variation computed directly from three-dimensional laser scans of 2094 bodies. This allowed us to generate three-dimensional models of people in the shape space using only their coordinates on analogous dimensions in the language-based description space. Human descriptions of photographed bodies and their corresponding models matched closely. The natural mapping between the spaces illustrates the role of language as a concise code for body shape, capturing perceptually salient global and local body features.

pdf [BibTex]

pdf [BibTex]


{Body Talk}: Crowdshaping Realistic {3D} Avatars with Words
Body Talk: Crowdshaping Realistic 3D Avatars with Words

Streuber, S., Quiros-Ramirez, M. A., Hill, M. Q., Hahn, C. A., Zuffi, S., O’Toole, A., Black, M. J.

ACM Trans. Graph. (Proc. SIGGRAPH), 35(4):54:1-54:14, July 2016 (article)

Abstract
Realistic, metrically accurate, 3D human avatars are useful for games, shopping, virtual reality, and health applications. Such avatars are not in wide use because solutions for creating them from high-end scanners, low-cost range cameras, and tailoring measurements all have limitations. Here we propose a simple solution and show that it is surprisingly accurate. We use crowdsourcing to generate attribute ratings of 3D body shapes corresponding to standard linguistic descriptions of 3D shape. We then learn a linear function relating these ratings to 3D human shape parameters. Given an image of a new body, we again turn to the crowd for ratings of the body shape. The collection of linguistic ratings of a photograph provides remarkably strong constraints on the metric 3D shape. We call the process crowdshaping and show that our Body Talk system produces shapes that are perceptually indistinguishable from bodies created from high-resolution scans and that the metric accuracy is sufficient for many tasks. This makes body “scanning” practical without a scanner, opening up new applications including database search, visualization, and extracting avatars from books.

pdf web tool video talk (ppt) [BibTex]

pdf web tool video talk (ppt) [BibTex]


Capturing Hands in Action using Discriminative Salient Points and Physics Simulation
Capturing Hands in Action using Discriminative Salient Points and Physics Simulation

Tzionas, D., Ballan, L., Srikantha, A., Aponte, P., Pollefeys, M., Gall, J.

International Journal of Computer Vision (IJCV), 118(2):172-193, June 2016 (article)

Abstract
Hand motion capture is a popular research field, recently gaining more attention due to the ubiquity of RGB-D sensors. However, even most recent approaches focus on the case of a single isolated hand. In this work, we focus on hands that interact with other hands or objects and present a framework that successfully captures motion in such interaction scenarios for both rigid and articulated objects. Our framework combines a generative model with discriminatively trained salient points to achieve a low tracking error and with collision detection and physics simulation to achieve physically plausible estimates even in case of occlusions and missing visual data. Since all components are unified in a single objective function which is almost everywhere differentiable, it can be optimized with standard optimization techniques. Our approach works for monocular RGB-D sequences as well as setups with multiple synchronized RGB cameras. For a qualitative and quantitative evaluation, we captured 29 sequences with a large variety of interactions and up to 150 degrees of freedom.

Website pdf link (url) DOI Project Page [BibTex]

Website pdf link (url) DOI Project Page [BibTex]


Human Pose Estimation from Video and IMUs
Human Pose Estimation from Video and IMUs

Marcard, T. V., Pons-Moll, G., Rosenhahn, B.

Transactions on Pattern Analysis and Machine Intelligence PAMI, 38(8):1533-1547, January 2016 (article)

data pdf dataset_documentation [BibTex]

data pdf dataset_documentation [BibTex]


Moving-horizon Nonlinear Least Squares-based Multirobot Cooperative Perception
Moving-horizon Nonlinear Least Squares-based Multirobot Cooperative Perception

Ahmad, A., Bülthoff, H.

Robotics and Autonomous Systems, 83, pages: 275-286, 2016 (article)

Abstract
In this article we present an online estimator for multirobot cooperative localization and target tracking based on nonlinear least squares minimization. Our method not only makes the rigorous optimization-based approach applicable online but also allows the estimator to be stable and convergent. We do so by employing a moving horizon technique to nonlinear least squares minimization and a novel design of the arrival cost function that ensures stability and convergence of the estimator. Through an extensive set of real robot experiments, we demonstrate the robustness of our method as well as the optimality of the arrival cost function. The experiments include comparisons of our method with i) an extended Kalman filter-based online-estimator and ii) an offline-estimator based on full-trajectory nonlinear least squares.

DOI Project Page [BibTex]

DOI Project Page [BibTex]


Perceiving Systems (2011-2015)
Perceiving Systems (2011-2015)
Scientific Advisory Board Report, 2016 (misc)

pdf [BibTex]

pdf [BibTex]


Shape estimation of subcutaneous adipose tissue using an articulated statistical shape model
Shape estimation of subcutaneous adipose tissue using an articulated statistical shape model

Yeo, S. Y., Romero, J., Loper, M., Machann, J., Black, M.

Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, 0(0):1-8, 2016 (article)

publisher website preprint pdf link (url) DOI Project Page [BibTex]

publisher website preprint pdf link (url) DOI Project Page [BibTex]


The GRASP Taxonomy of Human Grasp Types
The GRASP Taxonomy of Human Grasp Types

Feix, T., Romero, J., Schmiedmayer, H., Dollar, A., Kragic, D.

Human-Machine Systems, IEEE Transactions on, 46(1):66-77, 2016 (article)

publisher website pdf DOI Project Page [BibTex]

publisher website pdf DOI Project Page [BibTex]


Map-Based Probabilistic Visual Self-Localization
Map-Based Probabilistic Visual Self-Localization

Brubaker, M. A., Geiger, A., Urtasun, R.

IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), 2016 (article)

Abstract
Accurate and efficient self-localization is a critical problem for autonomous systems. This paper describes an affordable solution to vehicle self-localization which uses odometry computed from two video cameras and road maps as the sole inputs. The core of the method is a probabilistic model for which an efficient approximate inference algorithm is derived. The inference algorithm is able to utilize distributed computation in order to meet the real-time requirements of autonomous systems in some instances. Because of the probabilistic nature of the model the method is capable of coping with various sources of uncertainty including noise in the visual odometry and inherent ambiguities in the map (e.g., in a Manhattan world). By exploiting freely available, community developed maps and visual odometry measurements, the proposed method is able to localize a vehicle to 4m on average after 52 seconds of driving on maps which contain more than 2,150km of drivable roads.

pdf Project Page [BibTex]

pdf Project Page [BibTex]

2012


An SVD-Based Approach for Ghost Detection and Removal in High Dynamic Range Images
An SVD-Based Approach for Ghost Detection and Removal in High Dynamic Range Images

Srikantha, A., Sidibe, D., Meriaudeau, F.

International Conference on Pattern Recognition (ICPR), pages: 380-383, November 2012 (article)

pdf [BibTex]

2012

pdf [BibTex]


Coupled Action Recognition and Pose Estimation from Multiple Views
Coupled Action Recognition and Pose Estimation from Multiple Views

Yao, A., Gall, J., van Gool, L.

International Journal of Computer Vision, 100(1):16-37, October 2012 (article)

publisher's site code pdf Project Page Project Page Project Page [BibTex]

publisher's site code pdf Project Page Project Page Project Page [BibTex]


{DRAPE: DRessing Any PErson}
DRAPE: DRessing Any PErson

Guan, P., Reiss, L., Hirshberg, D., Weiss, A., Black, M. J.

ACM Trans. on Graphics (Proc. SIGGRAPH), 31(4):35:1-35:10, July 2012 (article)

Abstract
We describe a complete system for animating realistic clothing on synthetic bodies of any shape and pose without manual intervention. The key component of the method is a model of clothing called DRAPE (DRessing Any PErson) that is learned from a physics-based simulation of clothing on bodies of different shapes and poses. The DRAPE model has the desirable property of "factoring" clothing deformations due to body shape from those due to pose variation. This factorization provides an approximation to the physical clothing deformation and greatly simplifies clothing synthesis. Given a parameterized model of the human body with known shape and pose parameters, we describe an algorithm that dresses the body with a garment that is customized to fit and possesses realistic wrinkles. DRAPE can be used to dress static bodies or animated sequences with a learned model of the cloth dynamics. Since the method is fully automated, it is appropriate for dressing large numbers of virtual characters of varying shape. The method is significantly more efficient than physical simulation.

YouTube pdf talk Project Page Project Page [BibTex]

YouTube pdf talk Project Page Project Page [BibTex]


Ghost Detection and Removal for High Dynamic Range Images: Recent Advances
Ghost Detection and Removal for High Dynamic Range Images: Recent Advances

Srikantha, A., Sidib’e, D.

Signal Processing: Image Communication, 27, pages: 650-662, July 2012 (article)

pdf link (url) [BibTex]

pdf link (url) [BibTex]


Visual Servoing on Unknown Objects
Visual Servoing on Unknown Objects

Gratal, X., Romero, J., Bohg, J., Kragic, D.

Mechatronics, 22(4):423-435, Elsevier, June 2012, Visual Servoing \{SI\} (article)

Abstract
We study visual servoing in a framework of detection and grasping of unknown objects. Classically, visual servoing has been used for applications where the object to be servoed on is known to the robot prior to the task execution. In addition, most of the methods concentrate on aligning the robot hand with the object without grasping it. In our work, visual servoing techniques are used as building blocks in a system capable of detecting and grasping unknown objects in natural scenes. We show how different visual servoing techniques facilitate a complete grasping cycle.

Grasping sequence video Offline calibration video Pdf DOI [BibTex]

Grasping sequence video Offline calibration video Pdf DOI [BibTex]


Visual Orientation and Directional Selectivity Through Thalamic Synchrony
Visual Orientation and Directional Selectivity Through Thalamic Synchrony

Stanley, G., Jin, J., Wang, Y., Desbordes, G., Wang, Q., Black, M., Alonso, J.

Journal of Neuroscience, 32(26):9073-9088, June 2012 (article)

Abstract
Thalamic neurons respond to visual scenes by generating synchronous spike trains on the timescale of 10–20 ms that are very effective at driving cortical targets. Here we demonstrate that this synchronous activity contains unexpectedly rich information about fundamental properties of visual stimuli. We report that the occurrence of synchronous firing of cat thalamic cells with highly overlapping receptive fields is strongly sensitive to the orientation and the direction of motion of the visual stimulus. We show that this stimulus selectivity is robust, remaining relatively unchanged under different contrasts and temporal frequencies (stimulus velocities). A computational analysis based on an integrate-and-fire model of the direct thalamic input to a layer 4 cortical cell reveals a strong correlation between the degree of thalamic synchrony and the nonlinear relationship between cortical membrane potential and the resultant firing rate. Together, these findings suggest a novel population code in the synchronous firing of neurons in the early visual pathway that could serve as the substrate for establishing cortical representations of the visual scene.

preprint publisher's site Project Page [BibTex]

preprint publisher's site Project Page [BibTex]


Bilinear Spatiotemporal Basis Models
Bilinear Spatiotemporal Basis Models

Akhter, I., Simon, T., Khan, S., Matthews, I., Sheikh, Y.

ACM Transactions on Graphics (TOG), 31(2):17, ACM, April 2012 (article)

Abstract
A variety of dynamic objects, such as faces, bodies, and cloth, are represented in computer graphics as a collection of moving spatial landmarks. Spatiotemporal data is inherent in a number of graphics applications including animation, simulation, and object and camera tracking. The principal modes of variation in the spatial geometry of objects are typically modeled using dimensionality reduction techniques, while concurrently, trajectory representations like splines and autoregressive models are widely used to exploit the temporal regularity of deformation. In this article, we present the bilinear spatiotemporal basis as a model that simultaneously exploits spatial and temporal regularity while maintaining the ability to generalize well to new sequences. This factorization allows the use of analytical, predefined functions to represent temporal variation (e.g., B-Splines or the Discrete Cosine Transform) resulting in efficient model representation and estimation. The model can be interpreted as representing the data as a linear combination of spatiotemporal sequences consisting of shape modes oscillating over time at key frequencies. We apply the bilinear model to natural spatiotemporal phenomena, including face, body, and cloth motion data, and compare it in terms of compaction, generalization ability, predictive precision, and efficiency to existing models. We demonstrate the application of the model to a number of graphics tasks including labeling, gap-filling, denoising, and motion touch-up.

pdf project page link (url) [BibTex]

pdf project page link (url) [BibTex]


A metric for comparing the anthropomorphic motion capability of artificial hands
A metric for comparing the anthropomorphic motion capability of artificial hands

Feix, T., Romero, J., Ek, C. H., Schmiedmayer, H., Kragic, D.

IEEE RAS Transactions on Robotics, TRO, pages: 974-980, 2012 (article)

Publisher site Human Grasping Database Project [BibTex]

Publisher site Human Grasping Database Project [BibTex]


The Ankyrin 3 (ANK3) Bipolar Disorder Gene Regulates Psychiatric-related Behaviors that are Modulated by Lithium and Stress
The Ankyrin 3 (ANK3) Bipolar Disorder Gene Regulates Psychiatric-related Behaviors that are Modulated by Lithium and Stress

Leussis, M., Berry-Scott, E., Saito, M., Jhuang, H., Haan, G., Alkan, O., Luce, C., Madison, J., Sklar, P., Serre, T., Root, D., Petryshen, T.

Biological Psychiatry , 2012 (article)

Prepublication Article Abstract [BibTex]

Prepublication Article Abstract [BibTex]


Natural Metrics and Least-Committed Priors for Articulated Tracking
Natural Metrics and Least-Committed Priors for Articulated Tracking

Soren Hauberg, Stefan Sommer, Kim S. Pedersen

Image and Vision Computing, 30(6-7):453-461, Elsevier, 2012 (article)

Publishers site Code PDF [BibTex]

Publishers site Code PDF [BibTex]

2008


A non-parametric {Bayesian} alternative to spike sorting
A non-parametric Bayesian alternative to spike sorting

Wood, F., Black, M. J.

J. Neuroscience Methods, 173(1):1–12, August 2008 (article)

Abstract
The analysis of extra-cellular neural recordings typically begins with careful spike sorting and all analysis of the data then rests on the correctness of the resulting spike trains. In many situations this is unproblematic as experimental and spike sorting procedures often focus on well isolated units. There is evidence in the literature, however, that errors in spike sorting can occur even with carefully collected and selected data. Additionally, chronically implanted electrodes and arrays with fixed electrodes cannot be easily adjusted to provide well isolated units. In these situations, multiple units may be recorded and the assignment of waveforms to units may be ambiguous. At the same time, analysis of such data may be both scientifically important and clinically relevant. In this paper we address this issue using a novel probabilistic model that accounts for several important sources of uncertainty and error in spike sorting. In lieu of sorting neural data to produce a single best spike train, we estimate a probabilistic model of spike trains given the observed data. We show how such a distribution over spike sortings can support standard neuroscientific questions while providing a representation of uncertainty in the analysis. As a representative illustration of the approach, we analyzed primary motor cortical tuning with respect to hand movement in data recorded with a chronic multi-electrode array in non-human primates.We found that the probabilistic analysis generally agrees with human sorters but suggests the presence of tuned units not detected by humans.

pdf preprint pdf from publisher PubMed [BibTex]

2008

pdf preprint pdf from publisher PubMed [BibTex]


Neural control of computer cursor velocity by decoding motor cortical spiking activity in humans with tetraplegia
Neural control of computer cursor velocity by decoding motor cortical spiking activity in humans with tetraplegia

(J. Neural Engineering Highlights of 2008 Collection)

Kim, S., Simeral, J., Hochberg, L., Donoghue, J. P., Black, M. J.

J. Neural Engineering, 5, pages: 455–476, 2008 (article)

Abstract
Computer-mediated connections between human motor cortical neurons and assistive devices promise to improve or restore lost function in people with paralysis. Recently, a pilot clinical study of an intracortical neural interface system demonstrated that a tetraplegic human was able to obtain continuous two-dimensional control of a computer cursor using neural activity recorded from his motor cortex. This control, however, was not sufficiently accurate for reliable use in many common computer control tasks. Here, we studied several central design choices for such a system including the kinematic representation for cursor movement, the decoding method that translates neuronal ensemble spiking activity into a control signal and the cursor control task used during training for optimizing the parameters of the decoding method. In two tetraplegic participants, we found that controlling a cursor’s velocity resulted in more accurate closed-loop control than controlling its position directly and that cursor velocity control was achieved more rapidly than position control. Control quality was further improved over conventional linear filters by using a probabilistic method, the Kalman filter, to decode human motor cortical activity. Performance assessment based on standard metrics used for the evaluation of a wide range of pointing devices demonstrated significantly improved cursor control with velocity rather than position decoding.

pdf preprint pdf from publisher [BibTex]

pdf preprint pdf from publisher [BibTex]


Brownian Warps for Non-Rigid Registration
Brownian Warps for Non-Rigid Registration

Mads Nielsen, Peter Johansen, Andrew Jackson, Benny Lautrup, Soren Hauberg

Journal of Mathematical Imaging and Vision, 31, pages: 221-231, Springer Netherlands, 2008 (article)

Publishers site PDF [BibTex]

Publishers site PDF [BibTex]


 An Efficient Algorithm for Modelling Duration in Hidden Markov Models, with a Dramatic Application
An Efficient Algorithm for Modelling Duration in Hidden Markov Models, with a Dramatic Application

Soren Hauberg, Jakob Sloth

Journal of Mathematical Imaging and Vision, 31, pages: 165-170, Springer Netherlands, 2008 (article)

Publishers site Paper site PDF [BibTex]

Publishers site Paper site PDF [BibTex]

2004


On the variability of manual spike sorting
On the variability of manual spike sorting

Wood, F., Black, M. J., Vargas-Irwin, C., Fellows, M., Donoghue, J. P.

IEEE Trans. Biomedical Engineering, 51(6):912-918, June 2004 (article)

pdf pdf from publisher [BibTex]

2004

pdf pdf from publisher [BibTex]


Modeling and decoding motor cortical activity using a switching {Kalman} filter
Modeling and decoding motor cortical activity using a switching Kalman filter

Wu, W., Black, M. J., Mumford, D., Gao, Y., Bienenstock, E., Donoghue, J. P.

IEEE Trans. Biomedical Engineering, 51(6):933-942, June 2004 (article)

Abstract
We present a switching Kalman filter model for the real-time inference of hand kinematics from a population of motor cortical neurons. Firing rates are modeled as a Gaussian mixture where the mean of each Gaussian component is a linear function of hand kinematics. A “hidden state” models the probability of each mixture component and evolves over time in a Markov chain. The model generalizes previous encoding and decoding methods, addresses the non-Gaussian nature of firing rates, and can cope with crudely sorted neural data common in on-line prosthetic applications.

pdf pdf from publisher [BibTex]

pdf pdf from publisher [BibTex]

2000


Probabilistic detection and tracking of motion boundaries
Probabilistic detection and tracking of motion boundaries

Black, M. J., Fleet, D. J.

Int. J. of Computer Vision, 38(3):231-245, July 2000 (article)

Abstract
We propose a Bayesian framework for representing and recognizing local image motion in terms of two basic models: translational motion and motion boundaries. Motion boundaries are represented using a non-linear generative model that explicitly encodes the orientation of the boundary, the velocities on either side, the motion of the occluding edge over time, and the appearance/disappearance of pixels at the boundary. We represent the posterior probability distribution over the model parameters given the image data using discrete samples. This distribution is propagated over time using a particle filtering algorithm. To efficiently represent such a high-dimensional space we initialize samples using the responses of a low-level motion discontinuity detector. The formulation and computational model provide a general probabilistic framework for motion estimation with multiple, non-linear, models.

pdf pdf from publisher Video [BibTex]

2000

pdf pdf from publisher Video [BibTex]


Design and use of linear models for image motion analysis
Design and use of linear models for image motion analysis

Fleet, D. J., Black, M. J., Yacoob, Y., Jepson, A. D.

Int. J. of Computer Vision, 36(3):171-193, 2000 (article)

Abstract
Linear parameterized models of optical flow, particularly affine models, have become widespread in image motion analysis. The linear model coefficients are straightforward to estimate, and they provide reliable estimates of the optical flow of smooth surfaces. Here we explore the use of parameterized motion models that represent much more varied and complex motions. Our goals are threefold: to construct linear bases for complex motion phenomena; to estimate the coefficients of these linear models; and to recognize or classify image motions from the estimated coefficients. We consider two broad classes of motions: i) generic “motion features” such as motion discontinuities and moving bars; and ii) non-rigid, object-specific, motions such as the motion of human mouths. For motion features we construct a basis of steerable flow fields that approximate the motion features. For object-specific motions we construct basis flow fields from example motions using principal component analysis. In both cases, the model coefficients can be estimated directly from spatiotemporal image derivatives with a robust, multi-resolution scheme. Finally, we show how these model coefficients can be use to detect and recognize specific motions such as occlusion boundaries and facial expressions.

pdf [BibTex]

pdf [BibTex]


Robustly estimating changes in image appearance
Robustly estimating changes in image appearance

Black, M. J., Fleet, D. J., Yacoob, Y.

Computer Vision and Image Understanding, 78(1):8-31, 2000 (article)

Abstract
We propose a generalized model of image “appearance change” in which brightness variation over time is represented as a probabilistic mixture of different causes. We define four generative models of appearance change due to (1) object or camera motion; (2) illumination phenomena; (3) specular reflections; and (4) “iconic changes” which are specific to the objects being viewed. These iconic changes include complex occlusion events and changes in the material properties of the objects. We develop a robust statistical framework for recovering these appearance changes in image sequences. This approach generalizes previous work on optical flow to provide a richer description of image events and more reliable estimates of image motion in the presence of shadows and specular reflections.

pdf pdf from publisher DOI [BibTex]

pdf pdf from publisher DOI [BibTex]

1998


Summarization of video-taped presentations: Automatic analysis of motion and gesture
Summarization of video-taped presentations: Automatic analysis of motion and gesture

Ju, S. X., Black, M. J., Minneman, S., Kimber, D.

IEEE Trans. on Circuits and Systems for Video Technology, 8(5):686-696, September 1998 (article)

Abstract
This paper presents an automatic system for analyzing and annotating video sequences of technical talks. Our method uses a robust motion estimation technique to detect key frames and segment the video sequence into subsequences containing a single overhead slide. The subsequences are stabilized to remove motion that occurs when the speaker adjusts their slides. Any changes remaining between frames in the stabilized sequences may be due to speaker gestures such as pointing or writing, and we use active contours to automatically track these potential gestures. Given the constrained domain, we define a simple set of actions that can be recognized based on the active contour shape and motion. The recognized actions provide an annotation of the sequence that can be used to access a condensed version of the talk from a Web page.

pdf pdf from publisher DOI [BibTex]

1998

pdf pdf from publisher DOI [BibTex]


Robust anisotropic diffusion
Robust anisotropic diffusion

Black, M. J., Sapiro, G., Marimont, D., Heeger, D.

IEEE Transactions on Image Processing, 7(3):421-432, March 1998 (article)

Abstract
Relations between anisotropic diffusion and robust statistics are described in this paper. Specifically, we show that anisotropic diffusion can be seen as a robust estimation procedure that estimates a piecewise smooth image from a noisy input image. The edge-stopping; function in the anisotropic diffusion equation is closely related to the error norm and influence function in the robust estimation framework. This connection leads to a new edge-stopping; function based on Tukey's biweight robust estimator that preserves sharper boundaries than previous formulations and improves the automatic stopping of the diffusion. The robust statistical interpretation also provides a means for detecting the boundaries (edges) between the piecewise smooth regions in an image that has been smoothed with anisotropic diffusion. Additionally, we derive a relationship between anisotropic diffusion and regularization with line processes. Adding constraints on the spatial organization of the line processes allows us to develop new anisotropic diffusion equations that result in a qualitative improvement in the continuity of edges

pdf pdf from publisher [BibTex]

pdf pdf from publisher [BibTex]


{PLAYBOT}: A visually-guided robot for physically disabled children
PLAYBOT: A visually-guided robot for physically disabled children

Tsotsos, J. K., Verghese, G., Dickinson, S., Jenkin, M., Jepson, A., Milios, E., Nuflo, F., Stevenson, S., Black, M., Metaxas, D., Culhane, S., Ye, Y., Mann, R.

Image & Vision Computing, Special Issue on Vision for the Disabled, 16(4):275-292, 1998 (article)

Abstract
This paper overviews the PLAYBOT project, a long-term, large-scale research program whose goal is to provide a directable robot which may enable physically disabled children to access and manipulate toys. This domain is the first test domain, but there is nothing inherent in the design of PLAYBOT that prohibits its extension to other tasks. The research is guided by several important goals: vision is the primary sensor; vision is task directed; the robot must be able to visually search its environment; object and event recognition are basic capabilities; environments must be natural and dynamic; users and environments are assumed to be unpredictable; task direction and reactivity must be smoothly integrated; and safety is of high importance. The emphasis of the research has been on vision for the robot this is the most challenging research aspect and the major bottleneck to the development of intelligent robots. Since the control framework is behavior-based, the visual capabilities of PLAYBOT are described in terms of visual behaviors. Many of the components of PLAYBOT are briefly described and several examples of implemented sub-systems are shown. The paper concludes with a description of the current overall system implementation, and a complete example of PLAYBOT performing a simple task.

pdf pdf from publisher DOI [BibTex]

pdf pdf from publisher DOI [BibTex]


EigenTracking: Robust matching and tracking of articulated objects using a view-based representation
EigenTracking: Robust matching and tracking of articulated objects using a view-based representation

Black, M. J., Jepson, A.

International Journal of Computer Vision, 26(1):63-84, 1998 (article)

Abstract
This paper describes an approach for tracking rigid and articulated objects using a view-based representation. The approach builds on and extends work on eigenspace representations, robust estimation techniques, and parameterized optical flow estimation. First, we note that the least-squares image reconstruction of standard eigenspace techniques has a number of problems and we reformulate the reconstruction problem as one of robust estimation. Second we define a “subspace constancy assumption” that allows us to exploit techniques for parameterized optical flow estimation to simultaneously solve for the view of an object and the affine transformation between the eigenspace and the image. To account for large affine transformations between the eigenspace and the image we define a multi-scale eigenspace representation and a coarse-to-fine matching strategy. Finally, we use these techniques to track objects over long image sequences in which the objects simultaneously undergo both affine image motions and changes of view. In particular we use this “EigenTracking” technique to track and recognize the gestures of a moving hand.

pdf pdf from publisher video [BibTex]

1997


Recognizing facial expressions in image sequences using local parameterized models of image motion
Recognizing facial expressions in image sequences using local parameterized models of image motion

Black, M. J., Yacoob, Y.

Int. Journal of Computer Vision, 25(1):23-48, 1997 (article)

Abstract
This paper explores the use of local parametrized models of image motion for recovering and recognizing the non-rigid and articulated motion of human faces. Parametric flow models (for example affine) are popular for estimating motion in rigid scenes. We observe that within local regions in space and time, such models not only accurately model non-rigid facial motions but also provide a concise description of the motion in terms of a small number of parameters. These parameters are intuitively related to the motion of facial features during facial expressions and we show how expressions such as anger, happiness, surprise, fear, disgust, and sadness can be recognized from the local parametric motions in the presence of significant head motion. The motion tracking and expression recognition approach performed with high accuracy in extensive laboratory experiments involving 40 subjects as well as in television and movie sequences.

pdf pdf from publisher abstract video [BibTex]