Header logo is ps


2018


Thumb xl dip final
Deep Inertial Poser: Learning to Reconstruct Human Pose from Sparse Inertial Measurements in Real Time

Huang, Y., Kaufmann, M., Aksan, E., Black, M. J., Hilliges, O., Pons-Moll, G.

ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 37, pages: 185:1-185:15, ACM, November 2018, Two first authors contributed equally (article)

Abstract
We demonstrate a novel deep neural network capable of reconstructing human full body pose in real-time from 6 Inertial Measurement Units (IMUs) worn on the user's body. In doing so, we address several difficult challenges. First, the problem is severely under-constrained as multiple pose parameters produce the same IMU orientations. Second, capturing IMU data in conjunction with ground-truth poses is expensive and difficult to do in many target application scenarios (e.g., outdoors). Third, modeling temporal dependencies through non-linear optimization has proven effective in prior work but makes real-time prediction infeasible. To address this important limitation, we learn the temporal pose priors using deep learning. To learn from sufficient data, we synthesize IMU data from motion capture datasets. A bi-directional RNN architecture leverages past and future information that is available at training time. At test time, we deploy the network in a sliding window fashion, retaining real time capabilities. To evaluate our method, we recorded DIP-IMU, a dataset consisting of 10 subjects wearing 17 IMUs for validation in 64 sequences with 330,000 time instants; this constitutes the largest IMU dataset publicly available. We quantitatively evaluate our approach on multiple datasets and show results from a real-time implementation. DIP-IMU and the code are available for research purposes.

data code pdf preprint video DOI Project Page [BibTex]

2018

data code pdf preprint video DOI Project Page [BibTex]


Thumb xl cover
Deep Neural Network-based Cooperative Visual Tracking through Multiple Micro Aerial Vehicles

Price, E., Lawless, G., Ludwig, R., Martinovic, I., Buelthoff, H. H., Black, M. J., Ahmad, A.

IEEE Robotics and Automation Letters, Robotics and Automation Letters, 3(4):3193-3200, IEEE, October 2018, Also accepted and presented in the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). (article)

Abstract
Multi-camera tracking of humans and animals in outdoor environments is a relevant and challenging problem. Our approach to it involves a team of cooperating micro aerial vehicles (MAVs) with on-board cameras only. DNNs often fail at objects with small scale or far away from the camera, which are typical characteristics of a scenario with aerial robots. Thus, the core problem addressed in this paper is how to achieve on-board, online, continuous and accurate vision-based detections using DNNs for visual person tracking through MAVs. Our solution leverages cooperation among multiple MAVs and active selection of most informative regions of image. We demonstrate the efficiency of our approach through simulations with up to 16 robots and real robot experiments involving two aerial robots tracking a person, while maintaining an active perception-driven formation. ROS-based source code is provided for the benefit of the community.

Published Version link (url) DOI [BibTex]

Published Version link (url) DOI [BibTex]


Thumb xl alice
First Impressions of Personality Traits From Body Shapes

Hu, Y., Parde, C. J., Hill, M. Q., Mahmood, N., O’Toole, A. J.

Psychological Science, 29(12):1969-–1983, October 2018 (article)

Abstract
People infer the personalities of others from their facial appearance. Whether they do so from body shapes is less studied. We explored personality inferences made from body shapes. Participants rated personality traits for male and female bodies generated with a three-dimensional body model. Multivariate spaces created from these ratings indicated that people evaluate bodies on valence and agency in ways that directly contrast positive and negative traits from the Big Five domains. Body-trait stereotypes based on the trait ratings revealed a myriad of diverse body shapes that typify individual traits. Personality-trait profiles were predicted reliably from a subset of the body-shape features used to specify the three-dimensional bodies. Body features related to extraversion and conscientiousness were predicted with the highest consensus, followed by openness traits. This study provides the first comprehensive look at the range, diversity, and reliability of personality inferences that people make from body shapes.

publisher site pdf DOI [BibTex]

publisher site pdf DOI [BibTex]


Thumb xl fict 05 00018 g003
Visual Perception and Evaluation of Photo-Realistic Self-Avatars From 3D Body Scans in Males and Females

Thaler, A., Piryankova, I., Stefanucci, J. K., Pujades, S., de la Rosa, S., Streuber, S., Romero, J., Black, M. J., Mohler, B. J.

Frontiers in ICT, 5, pages: 1-14, September 2018 (article)

Abstract
The creation or streaming of photo-realistic self-avatars is important for virtual reality applications that aim for perception and action to replicate real world experience. The appearance and recognition of a digital self-avatar may be especially important for applications related to telepresence, embodied virtual reality, or immersive games. We investigated gender differences in the use of visual cues (shape, texture) of a self-avatar for estimating body weight and evaluating avatar appearance. A full-body scanner was used to capture each participant's body geometry and color information and a set of 3D virtual avatars with realistic weight variations was created based on a statistical body model. Additionally, a second set of avatars was created with an average underlying body shape matched to each participant’s height and weight. In four sets of psychophysical experiments, the influence of visual cues on the accuracy of body weight estimation and the sensitivity to weight changes was assessed by manipulating body shape (own, average) and texture (own photo-realistic, checkerboard). The avatars were presented on a large-screen display, and participants responded to whether the avatar's weight corresponded to their own weight. Participants also adjusted the avatar's weight to their desired weight and evaluated the avatar's appearance with regard to similarity to their own body, uncanniness, and their willingness to accept it as a digital representation of the self. The results of the psychophysical experiments revealed no gender difference in the accuracy of estimating body weight in avatars. However, males accepted a larger weight range of the avatars as corresponding to their own. In terms of the ideal body weight, females but not males desired a thinner body. With regard to the evaluation of avatar appearance, the questionnaire responses suggest that own photo-realistic texture was more important to males for higher similarity ratings, while own body shape seemed to be more important to females. These results argue for gender-specific considerations when creating self-avatars.

pdf DOI [BibTex]

pdf DOI [BibTex]


Thumb xl mazen
Robust Physics-based Motion Retargeting with Realistic Body Shapes

Borno, M. A., Righetti, L., Black, M. J., Delp, S. L., Fiume, E., Romero, J.

Computer Graphics Forum, 37, pages: 6:1-12, July 2018 (article)

Abstract
Motion capture is often retargeted to new, and sometimes drastically different, characters. When the characters take on realistic human shapes, however, we become more sensitive to the motion looking right. This means adapting it to be consistent with the physical constraints imposed by different body shapes. We show how to take realistic 3D human shapes, approximate them using a simplified representation, and animate them so that they move realistically using physically-based retargeting. We develop a novel spacetime optimization approach that learns and robustly adapts physical controllers to new bodies and constraints. The approach automatically adapts the motion of the mocap subject to the body shape of a target subject. This motion respects the physical properties of the new body and every body shape results in a different and appropriate movement. This makes it easy to create a varied set of motions from a single mocap sequence by simply varying the characters. In an interactive environment, successful retargeting requires adapting the motion to unexpected external forces. We achieve robustness to such forces using a novel LQR-tree formulation. We show that the simulated motions look appropriate to each character’s anatomy and their actions are robust to perturbations.

pdf video Project Page Project Page [BibTex]

pdf video Project Page Project Page [BibTex]


Thumb xl thesis cover2
Model-based Optical Flow: Layers, Learning, and Geometry

Wulff, J.

Tuebingen University, April 2018 (phdthesis)

Abstract
The estimation of motion in video sequences establishes temporal correspondences between pixels and surfaces and allows reasoning about a scene using multiple frames. Despite being a focus of research for over three decades, computing motion, or optical flow, remains challenging due to a number of difficulties, including the treatment of motion discontinuities and occluded regions, and the integration of information from more than two frames. One reason for these issues is that most optical flow algorithms only reason about the motion of pixels on the image plane, while not taking the image formation pipeline or the 3D structure of the world into account. One approach to address this uses layered models, which represent the occlusion structure of a scene and provide an approximation to the geometry. The goal of this dissertation is to show ways to inject additional knowledge about the scene into layered methods, making them more robust, faster, and more accurate. First, this thesis demonstrates the modeling power of layers using the example of motion blur in videos, which is caused by fast motion relative to the exposure time of the camera. Layers segment the scene into regions that move coherently while preserving their occlusion relationships. The motion of each layer therefore directly determines its motion blur. At the same time, the layered model captures complex blur overlap effects at motion discontinuities. Using layers, we can thus formulate a generative model for blurred video sequences, and use this model to simultaneously deblur a video and compute accurate optical flow for highly dynamic scenes containing motion blur. Next, we consider the representation of the motion within layers. Since, in a layered model, important motion discontinuities are captured by the segmentation into layers, the flow within each layer varies smoothly and can be approximated using a low dimensional subspace. We show how this subspace can be learned from training data using principal component analysis (PCA), and that flow estimation using this subspace is computationally efficient. The combination of the layered model and the low-dimensional subspace gives the best of both worlds, sharp motion discontinuities from the layers and computational efficiency from the subspace. Lastly, we show how layered methods can be dramatically improved using simple semantics. Instead of treating all layers equally, a semantic segmentation divides the scene into its static parts and moving objects. Static parts of the scene constitute a large majority of what is shown in typical video sequences; yet, in such regions optical flow is fully constrained by the depth structure of the scene and the camera motion. After segmenting out moving objects, we consider only static regions, and explicitly reason about the structure of the scene and the camera motion, yielding much better optical flow estimates. Furthermore, computing the structure of the scene allows to better combine information from multiple frames, resulting in high accuracies even in occluded regions. For moving regions, we compute the flow using a generic optical flow method, and combine it with the flow computed for the static regions to obtain a full optical flow field. By combining layered models of the scene with reasoning about the dynamic behavior of the real, three-dimensional world, the methods presented herein push the envelope of optical flow computation in terms of robustness, speed, and accuracy, giving state-of-the-art results on benchmarks and pointing to important future research directions for the estimation of motion in natural scenes.

Official link DOI Project Page [BibTex]


Thumb xl animage2mask3
Assessing body image in anorexia nervosa using biometric self-avatars in virtual reality: Attitudinal components rather than visual body size estimation are distorted

Mölbert, S. C., Thaler, A., Mohler, B. J., Streuber, S., Romero, J., Black, M. J., Zipfel, S., Karnath, H., Giel, K. E.

Psychological Medicine, 48(4):642-653, March 2018 (article)

Abstract
Background: Body image disturbance (BID) is a core symptom of anorexia nervosa (AN), but as yet distinctive features of BID are unknown. The present study aimed at disentangling perceptual and attitudinal components of BID in AN. Methods: We investigated n=24 women with AN and n=24 controls. Based on a 3D body scan, we created realistic virtual 3D bodies (avatars) for each participant that were varied through a range of ±20% of the participants' weights. Avatars were presented in a virtual reality mirror scenario. Using different psychophysical tasks, participants identified and adjusted their actual and their desired body weight. To test for general perceptual biases in estimating body weight, a second experiment investigated perception of weight and shape matched avatars with another identity. Results: Women with AN and controls underestimated their weight, with a trend that women with AN underestimated more. The average desired body of controls had normal weight while the average desired weight of women with AN corresponded to extreme AN (DSM-5). Correlation analyses revealed that desired body weight, but not accuracy of weight estimation, was associated with eating disorder symptoms. In the second experiment, both groups estimated accurately while the most attractive body was similar to Experiment 1. Conclusions: Our results contradict the widespread assumption that patients with AN overestimate their body weight due to visual distortions. Rather, they illustrate that BID might be driven by distorted attitudes with regard to the desired body. Clinical interventions should aim at helping patients with AN to change their desired weight.

doi pdf DOI Project Page [BibTex]


Thumb xl plos1
Body size estimation of self and others in females varying in BMI

Thaler, A., Geuss, M. N., Mölbert, S. C., Giel, K. E., Streuber, S., Romero, J., Black, M. J., Mohler, B. J.

PLoS ONE, 13(2), Febuary 2018 (article)

Abstract
Previous literature suggests that a disturbed ability to accurately identify own body size may contribute to overweight. Here, we investigated the influence of personal body size, indexed by body mass index (BMI), on body size estimation in a non-clinical population of females varying in BMI. We attempted to disentangle general biases in body size estimates and attitudinal influences by manipulating whether participants believed the body stimuli (personalized avatars with realistic weight variations) represented their own body or that of another person. Our results show that the accuracy of own body size estimation is predicted by personal BMI, such that participants with lower BMI underestimated their body size and participants with higher BMI overestimated their body size. Further, participants with higher BMI were less likely to notice the same percentage of weight gain than participants with lower BMI. Importantly, these results were only apparent when participants were judging a virtual body that was their own identity (Experiment 1), but not when they estimated the size of a body with another identity and the same underlying body shape (Experiment 2a). The different influences of BMI on accuracy of body size estimation and sensitivity to weight change for self and other identity suggests that effects of BMI on visual body size estimation are self-specific and not generalizable to other bodies.

pdf DOI Project Page [BibTex]


Thumb xl yanzhang clustering
Temporal Human Action Segmentation via Dynamic Clustering

Zhang, Y., Sun, H., Tang, S., Neumann, H.

arXiv preprint arXiv:1803.05790, 2018 (article)

Abstract
We present an effective dynamic clustering algorithm for the task of temporal human action segmentation, which has comprehensive applications such as robotics, motion analysis, and patient monitoring. Our proposed algorithm is unsupervised, fast, generic to process various types of features, and applica- ble in both the online and offline settings. We perform extensive experiments of processing data streams, and show that our algorithm achieves the state-of- the-art results for both online and offline settings.

link (url) [BibTex]

link (url) [BibTex]


Thumb xl motion segmentation tracking clustering teaser
Motion Segmentation & Multiple Object Tracking by Correlation Co-Clustering

Keuper, M., Tang, S., Andres, B., Brox, T., Schiele, B.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018 (article)

pdf DOI Project Page [BibTex]

pdf DOI Project Page [BibTex]

2006


Thumb xl screen shot 2012 06 06 at 11.31.38 am
Implicit Wiener Series, Part II: Regularised estimation

Gehler, P., Franz, M.

(148), Max Planck Institute, 2006 (techreport)

pdf [BibTex]

2006


Thumb xl evatr
HumanEva: Synchronized video and motion capture dataset for evaluation of articulated human motion

Sigal, L., Black, M. J.

(CS-06-08), Brown University, Department of Computer Science, 2006 (techreport)

pdf abstract [BibTex]

pdf abstract [BibTex]


Thumb xl neuralcomp
Bayesian population decoding of motor cortical activity using a Kalman filter

Wu, W., Gao, Y., Bienenstock, E., Donoghue, J. P., Black, M. J.

Neural Computation, 18(1):80-118, 2006 (article)

Abstract
Effective neural motor prostheses require a method for decoding neural activity representing desired movement. In particular, the accurate reconstruction of a continuous motion signal is necessary for the control of devices such as computer cursors, robots, or a patient's own paralyzed limbs. For such applications, we developed a real-time system that uses Bayesian inference techniques to estimate hand motion from the firing rates of multiple neurons. In this study, we used recordings that were previously made in the arm area of primary motor cortex in awake behaving monkeys using a chronically implanted multielectrode microarray. Bayesian inference involves computing the posterior probability of the hand motion conditioned on a sequence of observed firing rates; this is formulated in terms of the product of a likelihood and a prior. The likelihood term models the probability of firing rates given a particular hand motion. We found that a linear gaussian model could be used to approximate this likelihood and could be readily learned from a small amount of training data. The prior term defines a probabilistic model of hand kinematics and was also taken to be a linear gaussian model. Decoding was performed using a Kalman filter, which gives an efficient recursive method for Bayesian inference when the likelihood and prior are linear and gaussian. In off-line experiments, the Kalman filter reconstructions of hand trajectory were more accurate than previously reported results. The resulting decoding algorithm provides a principled probabilistic model of motor-cortical coding, decodes hand motion in real time, provides an estimate of uncertainty, and is straightforward to implement. Additionally the formulation unifies and extends previous models of neural coding while providing insights into the motor-cortical code.

pdf preprint pdf from publisher abstract [BibTex]

pdf preprint pdf from publisher abstract [BibTex]

1999


Thumb xl bildschirmfoto 2012 12 06 um 09.38.15
Parameterized modeling and recognition of activities

Yacoob, Y., Black, M. J.

Computer Vision and Image Understanding, 73(2):232-247, 1999 (article)

Abstract
In this paper we consider a class of human activities—atomic activities—which can be represented as a set of measurements over a finite temporal window (e.g., the motion of human body parts during a walking cycle) and which has a relatively small space of variations in performance. A new approach for modeling and recognition of atomic activities that employs principal component analysis and analytical global transformations is proposed. The modeling of sets of exemplar instances of activities that are similar in duration and involve similar body part motions is achieved by parameterizing their representation using principal component analysis. The recognition of variants of modeled activities is achieved by searching the space of admissible parameterized transformations that these activities can undergo. This formulation iteratively refines the recognition of the class to which the observed activity belongs and the transformation parameters that relate it to the model in its class. We provide several experiments on recognition of articulated and deformable human motions from image motion parameters.

pdf pdf from publisher DOI [BibTex]

1999

pdf pdf from publisher DOI [BibTex]

1997


Thumb xl yasersmile
Recognizing facial expressions in image sequences using local parameterized models of image motion

Black, M. J., Yacoob, Y.

Int. Journal of Computer Vision, 25(1):23-48, 1997 (article)

Abstract
This paper explores the use of local parametrized models of image motion for recovering and recognizing the non-rigid and articulated motion of human faces. Parametric flow models (for example affine) are popular for estimating motion in rigid scenes. We observe that within local regions in space and time, such models not only accurately model non-rigid facial motions but also provide a concise description of the motion in terms of a small number of parameters. These parameters are intuitively related to the motion of facial features during facial expressions and we show how expressions such as anger, happiness, surprise, fear, disgust, and sadness can be recognized from the local parametric motions in the presence of significant head motion. The motion tracking and expression recognition approach performed with high accuracy in extensive laboratory experiments involving 40 subjects as well as in television and movie sequences.

pdf pdf from publisher abstract video [BibTex]