Header logo is ps


2019


Thumb xl celia
Decoding subcategories of human bodies from both body- and face-responsive cortical regions

Foster, C., Zhao, M., Romero, J., Black, M. J., Mohler, B. J., Bartels, A., Bülthoff, I.

NeuroImage, 202(15):116085, November 2019 (article)

Abstract
Our visual system can easily categorize objects (e.g. faces vs. bodies) and further differentiate them into subcategories (e.g. male vs. female). This ability is particularly important for objects of social significance, such as human faces and bodies. While many studies have demonstrated category selectivity to faces and bodies in the brain, how subcategories of faces and bodies are represented remains unclear. Here, we investigated how the brain encodes two prominent subcategories shared by both faces and bodies, sex and weight, and whether neural responses to these subcategories rely on low-level visual, high-level visual or semantic similarity. We recorded brain activity with fMRI while participants viewed faces and bodies that varied in sex, weight, and image size. The results showed that the sex of bodies can be decoded from both body- and face-responsive brain areas, with the former exhibiting more consistent size-invariant decoding than the latter. Body weight could also be decoded in face-responsive areas and in distributed body-responsive areas, and this decoding was also invariant to image size. The weight of faces could be decoded from the fusiform body area (FBA), and weight could be decoded across face and body stimuli in the extrastriate body area (EBA) and a distributed body-responsive area. The sex of well-controlled faces (e.g. excluding hairstyles) could not be decoded from face- or body-responsive regions. These results demonstrate that both face- and body-responsive brain regions encode information that can distinguish the sex and weight of bodies. Moreover, the neural patterns corresponding to sex and weight were invariant to image size and could sometimes generalize across face and body stimuli, suggesting that such subcategorical information is encoded with a high-level visual or semantic code.

paper pdf DOI [BibTex]

2019

paper pdf DOI [BibTex]


Thumb xl multihumanoflow thumb
Learning Multi-Human Optical Flow

Ranjan, A., Hoffmann, D. T., Tzionas, D., Tang, S., Romero, J., Black, M. J.

arxiv preprint arXiv:1910.1166, November 2019 (article)

Abstract
The optical flow of humans is well known to be useful for the analysis of human action. Recent optical flow methods focus on training deep networks to approach the problem. However, the training data used by them does not cover the domain of human motion. Therefore, we develop a dataset of multi-human optical flow and train optical flow networks on this dataset. We use a 3D model of the human body and motion capture data to synthesize realistic flow fields in both single-and multi-person images. We then train optical flow networks to estimate human flow fields from pairs of images. We demonstrate that our trained networks are more accurate than a wide range of top methods on held-out test data and that they can generalize well to real image sequences. The code, trained models and the dataset are available for research.

Paper poster link (url) [BibTex]


Thumb xl autonomous mocap cover image new
Active Perception based Formation Control for Multiple Aerial Vehicles

Tallamraju, R., Price, E., Ludwig, R., Karlapalem, K., Bülthoff, H. H., Black, M. J., Ahmad, A.

IEEE Robotics and Automation Letters, Robotics and Automation Letters, 4(4):4491-4498, IEEE, October 2019 (article)

Abstract
We present a novel robotic front-end for autonomous aerial motion-capture (mocap) in outdoor environments. In previous work, we presented an approach for cooperative detection and tracking (CDT) of a subject using multiple micro-aerial vehicles (MAVs). However, it did not ensure optimal view-point configurations of the MAVs to minimize the uncertainty in the person's cooperatively tracked 3D position estimate. In this article, we introduce an active approach for CDT. In contrast to cooperatively tracking only the 3D positions of the person, the MAVs can actively compute optimal local motion plans, resulting in optimal view-point configurations, which minimize the uncertainty in the tracked estimate. We achieve this by decoupling the goal of active tracking into a quadratic objective and non-convex constraints corresponding to angular configurations of the MAVs w.r.t. the person. We derive this decoupling using Gaussian observation model assumptions within the CDT algorithm. We preserve convexity in optimization by embedding all the non-convex constraints, including those for dynamic obstacle avoidance, as external control inputs in the MPC dynamics. Multiple real robot experiments and comparisons involving 3 MAVs in several challenging scenarios are presented.

pdf DOI Project Page [BibTex]

pdf DOI Project Page [BibTex]


Thumb xl 3dmm
3D Morphable Face Models - Past, Present and Future

Egger, B., Smith, W. A. P., Tewari, A., Wuhrer, S., Zollhoefer, M., Beeler, T., Bernard, F., Bolkart, T., Kortylewski, A., Romdhani, S., Theobalt, C., Blanz, V., Vetter, T.

arxiv preprint arXiv:1909.01815, September 2019 (article)

Abstract
In this paper, we provide a detailed survey of 3D Morphable Face Models over the 20 years since they were first proposed. The challenges in building and applying these models, namely capture, modeling, image formation,and image analysis, are still active research topics, and we review the state-of-the-art in each of these areas. We also look ahead, identifying unsolved challenges, proposing directions for future research and highlighting the broad range of current and future applications.

paper project page [BibTex]

paper project page [BibTex]


Thumb xl hessepami
Learning and Tracking the 3D Body Shape of Freely Moving Infants from RGB-D sequences

Hesse, N., Pujades, S., Black, M., Arens, M., Hofmann, U., Schroeder, S.

Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019 (article)

Abstract
Statistical models of the human body surface are generally learned from thousands of high-quality 3D scans in predefined poses to cover the wide variety of human body shapes and articulations. Acquisition of such data requires expensive equipment, calibration procedures, and is limited to cooperative subjects who can understand and follow instructions, such as adults. We present a method for learning a statistical 3D Skinned Multi-Infant Linear body model (SMIL) from incomplete, low-quality RGB-D sequences of freely moving infants. Quantitative experiments show that SMIL faithfully represents the RGB-D data and properly factorizes the shape and pose of the infants. To demonstrate the applicability of SMIL, we fit the model to RGB-D sequences of freely moving infants and show, with a case study, that our method captures enough motion detail for General Movements Assessment (GMA), a method used in clinical practice for early detection of neurodevelopmental disorders in infants. SMIL provides a new tool for analyzing infant shape and movement and is a step towards an automated system for GMA.

pdf Journal DOI [BibTex]

pdf Journal DOI [BibTex]


Thumb xl kenny
Perceptual Effects of Inconsistency in Human Animations

Kenny, S., Mahmood, N., Honda, C., Black, M. J., Troje, N. F.

ACM Trans. Appl. Percept., 16(1):2:1-2:18, Febuary 2019 (article)

Abstract
The individual shape of the human body, including the geometry of its articulated structure and the distribution of weight over that structure, influences the kinematics of a person’s movements. How sensitive is the visual system to inconsistencies between shape and motion introduced by retargeting motion from one person onto the shape of another? We used optical motion capture to record five pairs of male performers with large differences in body weight, while they pushed, lifted, and threw objects. From these data, we estimated both the kinematics of the actions as well as the performer’s individual body shape. To obtain consistent and inconsistent stimuli, we created animated avatars by combining the shape and motion estimates from either a single performer or from different performers. Using these stimuli we conducted three experiments in an immersive virtual reality environment. First, a group of participants detected which of two stimuli was inconsistent. Performance was very low, and results were only marginally significant. Next, a second group of participants rated perceived attractiveness, eeriness, and humanness of consistent and inconsistent stimuli, but these judgements of animation characteristics were not affected by consistency of the stimuli. Finally, a third group of participants rated properties of the objects rather than of the performers. Here, we found strong influences of shape-motion inconsistency on perceived weight and thrown distance of objects. This suggests that the visual system relies on its knowledge of shape and motion and that these components are assimilated into an altered perception of the action outcome. We propose that the visual system attempts to resist inconsistent interpretations of human animations. Actions involving object manipulations present an opportunity for the visual system to reinterpret the introduced inconsistencies as a change in the dynamics of an object rather than as an unexpected combination of body shape and body motion.

publisher pdf DOI [BibTex]

publisher pdf DOI [BibTex]


Thumb xl screenshot from 2019 12 09 17 01 24
Generating 3D People in Scenes without People

Zhang, Y., Hassan, M., Neumann, H., Black, M. J., Tang, S.

arXiv:1912.02923, 2019 (article)

Abstract
We present a fully-automatic system that takes a 3D scene and generates plausible 3D human bodies that are posed naturally in that 3D scene. Given a 3D scene without people, humans can easily imagine how people could interact with the scene and the objects in it. However, this is a challenging task for a computer as solving it requires (1) the generated human bodies should be semantically plausible with the 3D environment, e.g. people sitting on the sofa or cooking near the stove; (2) the generated human-scene interaction should be physically feasible in the way that the human body and scene do not interpenetrate while, at the same time, body-scene contact supports physical interactions. To that end, we make use of the surface-based 3D human model SMPL-X. We first train a conditional variational autoencoder to predict semantically plausible 3D human pose conditioned on latent scene representations, then we further refine the generated 3D bodies using scene constraints to enforce feasible physical interaction. We show that our approach is able to synthesize realistic and expressive 3D human bodies that naturally interact with 3D environment. We perform extensive experiments demonstrating that our generative framework compares favorably with existing methods, both qualitatively and quantitatively. We believe that our scene-conditioned 3D human generation pipeline will be useful for numerous applications; e.g. to generate training data for human pose estimation, in video games and in VR/AR.

PDF link (url) [BibTex]


Thumb xl virtualcaliper
The Virtual Caliper: Rapid Creation of Metrically Accurate Avatars from 3D Measurements

Pujades, S., Mohler, B., Thaler, A., Tesch, J., Mahmood, N., Hesse, N., Bülthoff, H. H., Black, M. J.

IEEE Transactions on Visualization and Computer Graphics, 25, pages: 1887,1897, IEEE, 2019 (article)

Abstract
Creating metrically accurate avatars is important for many applications such as virtual clothing try-on, ergonomics, medicine, immersive social media, telepresence, and gaming. Creating avatars that precisely represent a particular individual is challenging however, due to the need for expensive 3D scanners, privacy issues with photographs or videos, and difficulty in making accurate tailoring measurements. We overcome these challenges by creating “The Virtual Caliper”, which uses VR game controllers to make simple measurements. First, we establish what body measurements users can reliably make on their own body. We find several distance measurements to be good candidates and then verify that these are linearly related to 3D body shape as represented by the SMPL body model. The Virtual Caliper enables novice users to accurately measure themselves and create an avatar with their own body shape. We evaluate the metric accuracy relative to ground truth 3D body scan data, compare the method quantitatively to other avatar creation tools, and perform extensive perceptual studies. We also provide a software application to the community that enables novices to rapidly create avatars in fewer than five minutes. Not only is our approach more rapid than existing methods, it exports a metrically accurate 3D avatar model that is rigged and skinned.

Project Page IEEE Open Access IEEE Open Access PDF DOI [BibTex]

Project Page IEEE Open Access IEEE Open Access PDF DOI [BibTex]

2017


Thumb xl flamewebteaserwide
Learning a model of facial shape and expression from 4D scans

Li, T., Bolkart, T., Black, M. J., Li, H., Romero, J.

ACM Transactions on Graphics, 36(6):194:1-194:17, November 2017, Two first authors contributed equally (article)

Abstract
The field of 3D face modeling has a large gap between high-end and low-end methods. At the high end, the best facial animation is indistinguishable from real humans, but this comes at the cost of extensive manual labor. At the low end, face capture from consumer depth sensors relies on 3D face models that are not expressive enough to capture the variability in natural facial shape and expression. We seek a middle ground by learning a facial model from thousands of accurately aligned 3D scans. Our FLAME model (Faces Learned with an Articulated Model and Expressions) is designed to work with existing graphics software and be easy to fit to data. FLAME uses a linear shape space trained from 3800 scans of human heads. FLAME combines this linear shape space with an articulated jaw, neck, and eyeballs, pose-dependent corrective blendshapes, and additional global expression from 4D face sequences in the D3DFACS dataset along with additional 4D sequences.We accurately register a template mesh to the scan sequences and make the D3DFACS registrations available for research purposes. In total the model is trained from over 33, 000 scans. FLAME is low-dimensional but more expressive than the FaceWarehouse model and the Basel Face Model. We compare FLAME to these models by fitting them to static 3D scans and 4D sequences using the same optimization method. FLAME is significantly more accurate and is available for research purposes (http://flame.is.tue.mpg.de).

data/model video code chumpy code tensorflow paper supplemental Project Page [BibTex]

2017

data/model video code chumpy code tensorflow paper supplemental Project Page [BibTex]


Thumb xl molbert
Investigating Body Image Disturbance in Anorexia Nervosa Using Novel Biometric Figure Rating Scales: A Pilot Study

Mölbert, S. C., Thaler, A., Streuber, S., Black, M. J., Karnath, H., Zipfel, S., Mohler, B., Giel, K. E.

European Eating Disorders Review, 25(6):607-612, November 2017 (article)

Abstract
This study uses novel biometric figure rating scales (FRS) spanning body mass index (BMI) 13.8 to 32.2 kg/m2 and BMI 18 to 42 kg/m2. The aims of the study were (i) to compare FRS body weight dissatisfaction and perceptual distortion of women with anorexia nervosa (AN) to a community sample; (ii) how FRS parameters are associated with questionnaire body dissatisfaction, eating disorder symptoms and appearance comparison habits; and (iii) whether the weight spectrum of the FRS matters. Women with AN (n = 24) and a community sample of women (n = 104) selected their current and ideal body on the FRS and completed additional questionnaires. Women with AN accurately picked the body that aligned best with their actual weight in both FRS. Controls underestimated their BMI in the FRS 14–32 and were accurate in the FRS 18–42. In both FRS, women with AN desired a body close to their actual BMI and controls desired a thinner body. Our observations suggest that body image disturbance in AN is unlikely to be characterized by a visual perceptual disturbance, but rather by an idealization of underweight in conjunction with high body dissatisfaction. The weight spectrum of FRS can influence the accuracy of BMI estimation.

publisher DOI Project Page [BibTex]


Thumb xl manoteaser
Embodied Hands: Modeling and Capturing Hands and Bodies Together

Romero, J., Tzionas, D., Black, M. J.

ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6):245:1-245:17, 245:1–245:17, ACM, November 2017 (article)

Abstract
Humans move their hands and bodies together to communicate and solve tasks. Capturing and replicating such coordinated activity is critical for virtual characters that behave realistically. Surprisingly, most methods treat the 3D modeling and tracking of bodies and hands separately. Here we formulate a model of hands and bodies interacting together and fit it to full-body 4D sequences. When scanning or capturing the full body in 3D, hands are small and often partially occluded, making their shape and pose hard to recover. To cope with low-resolution, occlusion, and noise, we develop a new model called MANO (hand Model with Articulated and Non-rigid defOrmations). MANO is learned from around 1000 high-resolution 3D scans of hands of 31 subjects in a wide variety of hand poses. The model is realistic, low-dimensional, captures non-rigid shape changes with pose, is compatible with standard graphics packages, and can fit any human hand. MANO provides a compact mapping from hand poses to pose blend shape corrections and a linear manifold of pose synergies. We attach MANO to a standard parameterized 3D body shape model (SMPL), resulting in a fully articulated body and hand model (SMPL+H). We illustrate SMPL+H by fitting complex, natural, activities of subjects captured with a 4D scanner. The fitting is fully automatic and results in full body models that move naturally with detailed hand motions and a realism not seen before in full body performance capture. The models and data are freely available for research purposes at http://mano.is.tue.mpg.de.

website youtube paper suppl video link (url) DOI Project Page [BibTex]

website youtube paper suppl video link (url) DOI Project Page [BibTex]


Thumb xl cover tro paper
An Online Scalable Approach to Unified Multirobot Cooperative Localization and Object Tracking

Ahmad, A., Lawless, G., Lima, P.

IEEE Transactions on Robotics (T-RO), 33, pages: 1184 - 1199, October 2017 (article)

Abstract
In this article we present a unified approach for multi-robot cooperative simultaneous localization and object tracking based on particle filters. Our approach is scalable with respect to the number of robots in the team. We introduce a method that reduces, from an exponential to a linear growth, the space and computation time requirements with respect to the number of robots in order to maintain a given level of accuracy in the full state estimation. Our method requires no increase in the number of particles with respect to the number of robots. However, in our method each particle represents a full state hypothesis, leading to the linear dependency on the number of robots of both space and time complexity. The derivation of the algorithm implementing our approach from a standard particle filter algorithm and its complexity analysis are presented. Through an extensive set of simulation experiments on a large number of randomized datasets, we demonstrate the correctness and efficacy of our approach. Through real robot experiments on a standardized open dataset of a team of four soccer playing robots tracking a ball, we evaluate our method's estimation accuracy with respect to the ground truth values. Through comparisons with other methods based on i) nonlinear least squares minimization and ii) joint extended Kalman filter, we further highlight our method's advantages. Finally, we also present a robustness test for our approach by evaluating it under scenarios of communication and vision failure in teammate robots.

Published Version link (url) DOI [BibTex]


Thumb xl early stopping teaser
Early Stopping Without a Validation Set

Mahsereci, M., Balles, L., Lassner, C., Hennig, P.

arXiv preprint arXiv:1703.09580, 2017 (article)

Abstract
Early stopping is a widely used technique to prevent poor generalization performance when training an over-expressive model by means of gradient-based optimization. To find a good point to halt the optimizer, a common practice is to split the dataset into a training and a smaller validation set to obtain an ongoing estimate of the generalization performance. In this paper we propose a novel early stopping criterion which is based on fast-to-compute, local statistics of the computed gradients and entirely removes the need for a held-out validation set. Our experiments show that this is a viable approach in the setting of least-squares and logistic regression as well as neural networks.

link (url) Project Page Project Page [BibTex]


Thumb xl web image
Data-Driven Physics for Human Soft Tissue Animation

Kim, M., Pons-Moll, G., Pujades, S., Bang, S., Kim, J., Black, M. J., Lee, S.

ACM Transactions on Graphics, (Proc. SIGGRAPH), 36(4):54:1-54:12, 2017 (article)

Abstract
Data driven models of human poses and soft-tissue deformations can produce very realistic results, but they only model the visible surface of the human body and cannot create skin deformation due to interactions with the environment. Physical simulations can generalize to external forces, but their parameters are difficult to control. In this paper, we present a layered volumetric human body model learned from data. Our model is composed of a data-driven inner layer and a physics-based external layer. The inner layer is driven with a volumetric statistical body model (VSMPL). The soft tissue layer consists of a tetrahedral mesh that is driven using the finite element method (FEM). Model parameters, namely the segmentation of the body into layers and the soft tissue elasticity, are learned directly from 4D registrations of humans exhibiting soft tissue deformations. The learned two layer model is a realistic full-body avatar that generalizes to novel motions and external forces. Experiments show that the resulting avatars produce realistic results on held out sequences and react to external forces. Moreover, the model supports the retargeting of physical properties from one avatar when they share the same topology.

video paper link (url) Project Page [BibTex]

video paper link (url) Project Page [BibTex]


Thumb xl web teaser eg
Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs

(Best Paper, Eurographics 2017)

Marcard, T. V., Rosenhahn, B., Black, M., Pons-Moll, G.

Computer Graphics Forum 36(2), Proceedings of the 38th Annual Conference of the European Association for Computer Graphics (Eurographics), pages: 349-360 , 2017 (article)

Abstract
We address the problem of making human motion capture in the wild more practical by using a small set of inertial sensors attached to the body. Since the problem is heavily under-constrained, previous methods either use a large number of sensors, which is intrusive, or they require additional video input. We take a different approach and constrain the problem by: (i) making use of a realistic statistical body model that includes anthropometric constraints and (ii) using a joint optimization framework to fit the model to orientation and acceleration measurements over multiple frames. The resulting tracker Sparse Inertial Poser (SIP) enables motion capture using only 6 sensors (attached to the wrists, lower legs, back and head) and works for arbitrary human motions. Experiments on the recently released TNT15 dataset show that, using the same number of sensors, SIP achieves higher accuracy than the dataset baseline without using any video data. We further demonstrate the effectiveness of SIP on newly recorded challenging motions in outdoor scenarios such as climbing or jumping over a wall

video pdf Project Page [BibTex]

video pdf Project Page [BibTex]


Thumb xl pami 2017 teaser
Efficient 2D and 3D Facade Segmentation using Auto-Context

Gadde, R., Jampani, V., Marlet, R., Gehler, P.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017 (article)

Abstract
This paper introduces a fast and efficient segmentation technique for 2D images and 3D point clouds of building facades. Facades of buildings are highly structured and consequently most methods that have been proposed for this problem aim to make use of this strong prior information. Contrary to most prior work, we are describing a system that is almost domain independent and consists of standard segmentation methods. We train a sequence of boosted decision trees using auto-context features. This is learned using stacked generalization. We find that this technique performs better, or comparable with all previous published methods and present empirical results on all available 2D and 3D facade benchmark datasets. The proposed method is simple to implement, easy to extend, and very efficient at test-time inference.

arXiv Project Page [BibTex]

arXiv Project Page [BibTex]


Thumb xl web image
ClothCap: Seamless 4D Clothing Capture and Retargeting

Pons-Moll, G., Pujades, S., Hu, S., Black, M.

ACM Transactions on Graphics, (Proc. SIGGRAPH), 36(4):73:1-73:15, ACM, New York, NY, USA, 2017, Two first authors contributed equally (article)

Abstract
Designing and simulating realistic clothing is challenging and, while several methods have addressed the capture of clothing from 3D scans, previous methods have been limited to single garments and simple motions, lack detail, or require specialized texture patterns. Here we address the problem of capturing regular clothing on fully dressed people in motion. People typically wear multiple pieces of clothing at a time. To estimate the shape of such clothing, track it over time, and render it believably, each garment must be segmented from the others and the body. Our ClothCap approach uses a new multi-part 3D model of clothed bodies, automatically segments each piece of clothing, estimates the naked body shape and pose under the clothing, and tracks the 3D deformations of the clothing over time. We estimate the garments and their motion from 4D scans; that is, high-resolution 3D scans of the subject in motion at 60 fps. The model allows us to capture a clothed person in motion, extract their clothing, and retarget the clothing to new body shapes. ClothCap provides a step towards virtual try-on with a technology for capturing, modeling, and analyzing clothing in motion.

video project_page paper link (url) DOI Project Page Project Page [BibTex]

video project_page paper link (url) DOI Project Page Project Page [BibTex]

2012


Thumb xl eigenmaps
An SVD-Based Approach for Ghost Detection and Removal in High Dynamic Range Images

Srikantha, A., Sidibe, D., Meriaudeau, F.

International Conference on Pattern Recognition (ICPR), pages: 380-383, November 2012 (article)

pdf [BibTex]

2012

pdf [BibTex]


Thumb xl posear
Coupled Action Recognition and Pose Estimation from Multiple Views

Yao, A., Gall, J., van Gool, L.

International Journal of Computer Vision, 100(1):16-37, October 2012 (article)

publisher's site code pdf Project Page Project Page Project Page [BibTex]

publisher's site code pdf Project Page Project Page Project Page [BibTex]


Thumb xl representativecrop
DRAPE: DRessing Any PErson

Guan, P., Reiss, L., Hirshberg, D., Weiss, A., Black, M. J.

ACM Trans. on Graphics (Proc. SIGGRAPH), 31(4):35:1-35:10, July 2012 (article)

Abstract
We describe a complete system for animating realistic clothing on synthetic bodies of any shape and pose without manual intervention. The key component of the method is a model of clothing called DRAPE (DRessing Any PErson) that is learned from a physics-based simulation of clothing on bodies of different shapes and poses. The DRAPE model has the desirable property of "factoring" clothing deformations due to body shape from those due to pose variation. This factorization provides an approximation to the physical clothing deformation and greatly simplifies clothing synthesis. Given a parameterized model of the human body with known shape and pose parameters, we describe an algorithm that dresses the body with a garment that is customized to fit and possesses realistic wrinkles. DRAPE can be used to dress static bodies or animated sequences with a learned model of the cloth dynamics. Since the method is fully automated, it is appropriate for dressing large numbers of virtual characters of varying shape. The method is significantly more efficient than physical simulation.

YouTube pdf talk Project Page Project Page [BibTex]

YouTube pdf talk Project Page Project Page [BibTex]


Thumb xl ghosthdr
Ghost Detection and Removal for High Dynamic Range Images: Recent Advances

Srikantha, A., Sidib’e, D.

Signal Processing: Image Communication, 27, pages: 650-662, July 2012 (article)

pdf link (url) [BibTex]

pdf link (url) [BibTex]


Thumb xl thumb screen shot 2012 10 06 at 11.48.38 am
Visual Servoing on Unknown Objects

Gratal, X., Romero, J., Bohg, J., Kragic, D.

Mechatronics, 22(4):423-435, Elsevier, June 2012, Visual Servoing \{SI\} (article)

Abstract
We study visual servoing in a framework of detection and grasping of unknown objects. Classically, visual servoing has been used for applications where the object to be servoed on is known to the robot prior to the task execution. In addition, most of the methods concentrate on aligning the robot hand with the object without grasping it. In our work, visual servoing techniques are used as building blocks in a system capable of detecting and grasping unknown objects in natural scenes. We show how different visual servoing techniques facilitate a complete grasping cycle.

Grasping sequence video Offline calibration video Pdf DOI [BibTex]

Grasping sequence video Offline calibration video Pdf DOI [BibTex]


Thumb xl jneuroscicrop
Visual Orientation and Directional Selectivity Through Thalamic Synchrony

Stanley, G., Jin, J., Wang, Y., Desbordes, G., Wang, Q., Black, M., Alonso, J.

Journal of Neuroscience, 32(26):9073-9088, June 2012 (article)

Abstract
Thalamic neurons respond to visual scenes by generating synchronous spike trains on the timescale of 10–20 ms that are very effective at driving cortical targets. Here we demonstrate that this synchronous activity contains unexpectedly rich information about fundamental properties of visual stimuli. We report that the occurrence of synchronous firing of cat thalamic cells with highly overlapping receptive fields is strongly sensitive to the orientation and the direction of motion of the visual stimulus. We show that this stimulus selectivity is robust, remaining relatively unchanged under different contrasts and temporal frequencies (stimulus velocities). A computational analysis based on an integrate-and-fire model of the direct thalamic input to a layer 4 cortical cell reveals a strong correlation between the degree of thalamic synchrony and the nonlinear relationship between cortical membrane potential and the resultant firing rate. Together, these findings suggest a novel population code in the synchronous firing of neurons in the early visual pathway that could serve as the substrate for establishing cortical representations of the visual scene.

preprint publisher's site Project Page [BibTex]

preprint publisher's site Project Page [BibTex]


Thumb xl bilinear
Bilinear Spatiotemporal Basis Models

Akhter, I., Simon, T., Khan, S., Matthews, I., Sheikh, Y.

ACM Transactions on Graphics (TOG), 31(2):17, ACM, April 2012 (article)

Abstract
A variety of dynamic objects, such as faces, bodies, and cloth, are represented in computer graphics as a collection of moving spatial landmarks. Spatiotemporal data is inherent in a number of graphics applications including animation, simulation, and object and camera tracking. The principal modes of variation in the spatial geometry of objects are typically modeled using dimensionality reduction techniques, while concurrently, trajectory representations like splines and autoregressive models are widely used to exploit the temporal regularity of deformation. In this article, we present the bilinear spatiotemporal basis as a model that simultaneously exploits spatial and temporal regularity while maintaining the ability to generalize well to new sequences. This factorization allows the use of analytical, predefined functions to represent temporal variation (e.g., B-Splines or the Discrete Cosine Transform) resulting in efficient model representation and estimation. The model can be interpreted as representing the data as a linear combination of spatiotemporal sequences consisting of shape modes oscillating over time at key frequencies. We apply the bilinear model to natural spatiotemporal phenomena, including face, body, and cloth motion data, and compare it in terms of compaction, generalization ability, predictive precision, and efficiency to existing models. We demonstrate the application of the model to a number of graphics tasks including labeling, gap-filling, denoising, and motion touch-up.

pdf project page link (url) [BibTex]

pdf project page link (url) [BibTex]


Thumb xl thumb latent space2
A metric for comparing the anthropomorphic motion capability of artificial hands

Feix, T., Romero, J., Ek, C. H., Schmiedmayer, H., Kragic, D.

IEEE RAS Transactions on Robotics, TRO, pages: 974-980, 2012 (article)

Publisher site Human Grasping Database Project [BibTex]

Publisher site Human Grasping Database Project [BibTex]


Thumb xl rat4
The Ankyrin 3 (ANK3) Bipolar Disorder Gene Regulates Psychiatric-related Behaviors that are Modulated by Lithium and Stress

Leussis, M., Berry-Scott, E., Saito, M., Jhuang, H., Haan, G., Alkan, O., Luce, C., Madison, J., Sklar, P., Serre, T., Root, D., Petryshen, T.

Biological Psychiatry , 2012 (article)

Prepublication Article Abstract [BibTex]

Prepublication Article Abstract [BibTex]


Thumb xl imavis2012
Natural Metrics and Least-Committed Priors for Articulated Tracking

Soren Hauberg, Stefan Sommer, Kim S. Pedersen

Image and Vision Computing, 30(6-7):453-461, Elsevier, 2012 (article)

Publishers site Code PDF [BibTex]

Publishers site Code PDF [BibTex]

2004


Thumb xl woodtransbme04
On the variability of manual spike sorting

Wood, F., Black, M. J., Vargas-Irwin, C., Fellows, M., Donoghue, J. P.

IEEE Trans. Biomedical Engineering, 51(6):912-918, June 2004 (article)

pdf pdf from publisher [BibTex]

2004

pdf pdf from publisher [BibTex]


Thumb xl wutransbme04
Modeling and decoding motor cortical activity using a switching Kalman filter

Wu, W., Black, M. J., Mumford, D., Gao, Y., Bienenstock, E., Donoghue, J. P.

IEEE Trans. Biomedical Engineering, 51(6):933-942, June 2004 (article)

Abstract
We present a switching Kalman filter model for the real-time inference of hand kinematics from a population of motor cortical neurons. Firing rates are modeled as a Gaussian mixture where the mean of each Gaussian component is a linear function of hand kinematics. A “hidden state” models the probability of each mixture component and evolves over time in a Markov chain. The model generalizes previous encoding and decoding methods, addresses the non-Gaussian nature of firing rates, and can cope with crudely sorted neural data common in on-line prosthetic applications.

pdf pdf from publisher [BibTex]

pdf pdf from publisher [BibTex]

1998


Thumb xl bildschirmfoto 2012 12 06 um 10.05.20
Summarization of video-taped presentations: Automatic analysis of motion and gesture

Ju, S. X., Black, M. J., Minneman, S., Kimber, D.

IEEE Trans. on Circuits and Systems for Video Technology, 8(5):686-696, September 1998 (article)

Abstract
This paper presents an automatic system for analyzing and annotating video sequences of technical talks. Our method uses a robust motion estimation technique to detect key frames and segment the video sequence into subsequences containing a single overhead slide. The subsequences are stabilized to remove motion that occurs when the speaker adjusts their slides. Any changes remaining between frames in the stabilized sequences may be due to speaker gestures such as pointing or writing, and we use active contours to automatically track these potential gestures. Given the constrained domain, we define a simple set of actions that can be recognized based on the active contour shape and motion. The recognized actions provide an annotation of the sequence that can be used to access a condensed version of the talk from a Web page.

pdf pdf from publisher DOI [BibTex]

1998

pdf pdf from publisher DOI [BibTex]


Thumb xl bildschirmfoto 2012 12 06 um 12.22.18
Robust anisotropic diffusion

Black, M. J., Sapiro, G., Marimont, D., Heeger, D.

IEEE Transactions on Image Processing, 7(3):421-432, March 1998 (article)

Abstract
Relations between anisotropic diffusion and robust statistics are described in this paper. Specifically, we show that anisotropic diffusion can be seen as a robust estimation procedure that estimates a piecewise smooth image from a noisy input image. The edge-stopping; function in the anisotropic diffusion equation is closely related to the error norm and influence function in the robust estimation framework. This connection leads to a new edge-stopping; function based on Tukey's biweight robust estimator that preserves sharper boundaries than previous formulations and improves the automatic stopping of the diffusion. The robust statistical interpretation also provides a means for detecting the boundaries (edges) between the piecewise smooth regions in an image that has been smoothed with anisotropic diffusion. Additionally, we derive a relationship between anisotropic diffusion and regularization with line processes. Adding constraints on the spatial organization of the line processes allows us to develop new anisotropic diffusion equations that result in a qualitative improvement in the continuity of edges

pdf pdf from publisher [BibTex]

pdf pdf from publisher [BibTex]


Thumb xl paybotteaser
PLAYBOT: A visually-guided robot for physically disabled children

Tsotsos, J. K., Verghese, G., Dickinson, S., Jenkin, M., Jepson, A., Milios, E., Nuflo, F., Stevenson, S., Black, M., Metaxas, D., Culhane, S., Ye, Y., Mann, R.

Image & Vision Computing, Special Issue on Vision for the Disabled, 16(4):275-292, 1998 (article)

Abstract
This paper overviews the PLAYBOT project, a long-term, large-scale research program whose goal is to provide a directable robot which may enable physically disabled children to access and manipulate toys. This domain is the first test domain, but there is nothing inherent in the design of PLAYBOT that prohibits its extension to other tasks. The research is guided by several important goals: vision is the primary sensor; vision is task directed; the robot must be able to visually search its environment; object and event recognition are basic capabilities; environments must be natural and dynamic; users and environments are assumed to be unpredictable; task direction and reactivity must be smoothly integrated; and safety is of high importance. The emphasis of the research has been on vision for the robot this is the most challenging research aspect and the major bottleneck to the development of intelligent robots. Since the control framework is behavior-based, the visual capabilities of PLAYBOT are described in terms of visual behaviors. Many of the components of PLAYBOT are briefly described and several examples of implemented sub-systems are shown. The paper concludes with a description of the current overall system implementation, and a complete example of PLAYBOT performing a simple task.

pdf pdf from publisher DOI [BibTex]

pdf pdf from publisher DOI [BibTex]


Thumb xl bildschirmfoto 2012 12 06 um 12.33.38
EigenTracking: Robust matching and tracking of articulated objects using a view-based representation

Black, M. J., Jepson, A.

International Journal of Computer Vision, 26(1):63-84, 1998 (article)

Abstract
This paper describes an approach for tracking rigid and articulated objects using a view-based representation. The approach builds on and extends work on eigenspace representations, robust estimation techniques, and parameterized optical flow estimation. First, we note that the least-squares image reconstruction of standard eigenspace techniques has a number of problems and we reformulate the reconstruction problem as one of robust estimation. Second we define a “subspace constancy assumption” that allows us to exploit techniques for parameterized optical flow estimation to simultaneously solve for the view of an object and the affine transformation between the eigenspace and the image. To account for large affine transformations between the eigenspace and the image we define a multi-scale eigenspace representation and a coarse-to-fine matching strategy. Finally, we use these techniques to track objects over long image sequences in which the objects simultaneously undergo both affine image motions and changes of view. In particular we use this “EigenTracking” technique to track and recognize the gestures of a moving hand.

pdf pdf from publisher video [BibTex]