Header logo is ps


2020


Learning to Dress 3D People in Generative Clothing
Learning to Dress 3D People in Generative Clothing

Ma, Q., Yang, J., Ranjan, A., Pujades, S., Pons-Moll, G., Tang, S., Black, M. J.

In Computer Vision and Pattern Recognition (CVPR), June 2020 (inproceedings)

Abstract
Three-dimensional human body models are widely used in the analysis of human pose and motion. Existing models, however, are learned from minimally-clothed 3D scans and thus do not generalize to the complexity of dressed people in common images and videos. Additionally, current models lack the expressive power needed to represent the complex non-linear geometry of pose-dependent clothing shape. To address this, we learn a generative 3D mesh model of clothed people from 3D scans with varying pose and clothing. Specifically, we train a conditional Mesh-VAE-GAN to learn the clothing deformation from the SMPL body model, making clothing an additional term on SMPL. Our model is conditioned on both pose and clothing type, giving the ability to draw samples of clothing to dress different body shapes in a variety of styles and poses. To preserve wrinkle detail, our Mesh-VAE-GAN extends patchwise discriminators to 3D meshes. Our model, named CAPE, represents global shape and fine local structure, effectively extending the SMPL body model to clothing. To our knowledge, this is the first generative model that directly dresses 3D human body meshes and generalizes to different poses.

arxiv project page [BibTex]

2020


{GENTEL : GENerating Training data Efficiently for Learning to segment medical images}
GENTEL : GENerating Training data Efficiently for Learning to segment medical images

Thakur, R. P., Rocamora, S. P., Goel, L., Pohmann, R., Machann, J., Black, M. J.

Congrès Reconnaissance des Formes, Image, Apprentissage et Perception (RFAIP), June 2020 (conference)

Abstract
Accurately segmenting MRI images is crucial for many clinical applications. However, manually segmenting images with accurate pixel precision is a tedious and time consuming task. In this paper we present a simple, yet effective method to improve the efficiency of the image segmentation process. We propose to transform the image annotation task into a binary choice task. We start by using classical image processing algorithms with different parameter values to generate multiple, different segmentation masks for each input MRI image. Then, instead of segmenting the pixels of the images, the user only needs to decide whether a segmentation is acceptable or not. This method allows us to efficiently obtain high quality segmentations with minor human intervention. With the selected segmentations, we train a state-of-the-art neural network model. For the evaluation, we use a second MRI dataset (1.5T Dataset), acquired with a different protocol and containing annotations. We show that the trained network i) is able to automatically segment cases where none of the classical methods obtain a high quality result ; ii) generalizes to the second MRI dataset, which was acquired with a different protocol and was never seen at training time ; and iii) enables detection of miss-annotations in this second dataset. Quantitatively, the trained network obtains very good results: DICE score - mean 0.98, median 0.99- and Hausdorff distance (in pixels) - mean 4.7, median 2.0-.

[BibTex]

[BibTex]


Generating 3D People in Scenes without People
Generating 3D People in Scenes without People

Zhang, Y., Hassan, M., Neumann, H., Black, M. J., Tang, S.

In Computer Vision and Pattern Recognition (CVPR), June 2020 (inproceedings)

Abstract
We present a fully automatic system that takes a 3D scene and generates plausible 3D human bodies that are posed naturally in that 3D scene. Given a 3D scene without people, humans can easily imagine how people could interact with the scene and the objects in it. However, this is a challenging task for a computer as solving it requires that (1) the generated human bodies to be semantically plausible within the 3D environment (e.g. people sitting on the sofa or cooking near the stove), and (2) the generated human-scene interaction to be physically feasible such that the human body and scene do not interpenetrate while, at the same time, body-scene contact supports physical interactions. To that end, we make use of the surface-based 3D human model SMPL-X. We first train a conditional variational autoencoder to predict semantically plausible 3D human poses conditioned on latent scene representations, then we further refine the generated 3D bodies using scene constraints to enforce feasible physical interaction. We show that our approach is able to synthesize realistic and expressive 3D human bodies that naturally interact with 3D environment. We perform extensive experiments demonstrating that our generative framework compares favorably with existing methods, both qualitatively and quantitatively. We believe that our scene-conditioned 3D human generation pipeline will be useful for numerous applications; e.g. to generate training data for human pose estimation, in video games and in VR/AR. Our project page for data and code can be seen at: \url{https://vlg.inf.ethz.ch/projects/PSI/}.

Code PDF [BibTex]

Code PDF [BibTex]


Learning Physics-guided Face Relighting under Directional Light
Learning Physics-guided Face Relighting under Directional Light

Nestmeyer, T., Lalonde, J., Matthews, I., Lehrmann, A. M.

In Conference on Computer Vision and Pattern Recognition, IEEE/CVF, June 2020 (inproceedings) Accepted

Abstract
Relighting is an essential step in realistically transferring objects from a captured image into another environment. For example, authentic telepresence in Augmented Reality requires faces to be displayed and relit consistent with the observer's scene lighting. We investigate end-to-end deep learning architectures that both de-light and relight an image of a human face. Our model decomposes the input image into intrinsic components according to a diffuse physics-based image formation model. We enable non-diffuse effects including cast shadows and specular highlights by predicting a residual correction to the diffuse render. To train and evaluate our model, we collected a portrait database of 21 subjects with various expressions and poses. Each sample is captured in a controlled light stage setup with 32 individual light sources. Our method creates precise and believable relighting results and generalizes to complex illumination conditions and challenging poses, including when the subject is not looking straight at the camera.

Paper [BibTex]

Paper [BibTex]


{VIBE}: Video Inference for Human Body Pose and Shape Estimation
VIBE: Video Inference for Human Body Pose and Shape Estimation

Kocabas, M., Athanasiou, N., Black, M. J.

In Computer Vision and Pattern Recognition (CVPR), June 2020 (inproceedings)

Abstract
Human motion is fundamental to understanding behavior. Despite progress on single-image 3D pose and shape estimation, existing video-based state-of-the-art methodsfail to produce accurate and natural motion sequences due to a lack of ground-truth 3D motion data for training. To address this problem, we propose “Video Inference for Body Pose and Shape Estimation” (VIBE), which makes use of an existing large-scale motion capture dataset (AMASS) together with unpaired, in-the-wild, 2D keypoint annotations. Our key novelty is an adversarial learning framework that leverages AMASS to discriminate between real human motions and those produced by our temporal pose and shape regression networks. We define a temporal network architecture and show that adversarial training, at the sequence level, produces kinematically plausible motion sequences without in-the-wild ground-truth 3D labels. We perform extensive experimentation to analyze the importance of motion and demonstrate the effectiveness of VIBE on challenging 3D pose estimation datasets, achieving state-of-the-art performance. Code and pretrained models are available at https://github.com/mkocabas/VIBE

arXiv code [BibTex]

arXiv code [BibTex]


From Variational to Deterministic Autoencoders
From Variational to Deterministic Autoencoders

Ghosh*, P., Sajjadi*, M. S. M., Vergari, A., Black, M. J., Schölkopf, B.

8th International Conference on Learning Representations (ICLR) , April 2020, *equal contribution (conference) Accepted

Abstract
Variational Autoencoders (VAEs) provide a theoretically-backed framework for deep generative models. However, they often produce “blurry” images, which is linked to their training objective. Sampling in the most popular implementation, the Gaussian VAE, can be interpreted as simply injecting noise to the input of a deterministic decoder. In practice, this simply enforces a smooth latent space structure. We challenge the adoption of the full VAE framework on this specific point in favor of a simpler, deterministic one. Specifically, we investigate how substituting stochasticity with other explicit and implicit regularization schemes can lead to a meaningful latent space without having to force it to conform to an arbitrarily chosen prior. To retrieve a generative mechanism for sampling new data points, we propose to employ an efficient ex-post density estimation step that can be readily adopted both for the proposed deterministic autoencoders as well as to improve sample quality of existing VAEs. We show in a rigorous empirical study that regularized deterministic autoencoding achieves state-of-the-art sample quality on the common MNIST, CIFAR-10 and CelebA datasets.

arXiv [BibTex]

arXiv [BibTex]


Chained Representation Cycling: Learning to Estimate 3D Human Pose and Shape by Cycling Between Representations
Chained Representation Cycling: Learning to Estimate 3D Human Pose and Shape by Cycling Between Representations

Rueegg, N., Lassner, C., Black, M. J., Schindler, K.

In Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), Febuary 2020 (inproceedings)

Abstract
The goal of many computer vision systems is to transform image pixels into 3D representations. Recent popular models use neural networks to regress directly from pixels to 3D object parameters. Such an approach works well when supervision is available, but in problems like human pose and shape estimation, it is difficult to obtain natural images with 3D ground truth. To go one step further, we propose a new architecture that facilitates unsupervised, or lightly supervised, learning. The idea is to break the problem into a series of transformations between increasingly abstract representations. Each step involves a cycle designed to be learnable without annotated training data, and the chain of cycles delivers the final solution. Specifically, we use 2D body part segments as an intermediate representation that contains enough information to be lifted to 3D, and at the same time is simple enough to be learned in an unsupervised way. We demonstrate the method by learning 3D human pose and shape from un-paired and un-annotated images. We also explore varying amounts of paired data and show that cycling greatly alleviates the need for paired data. While we present results for modeling humans, our formulation is general and can be applied to other vision problems.

pdf [BibTex]

pdf [BibTex]

2002


Inferring hand motion from multi-cell recordings in motor cortex using a {Kalman} filter
Inferring hand motion from multi-cell recordings in motor cortex using a Kalman filter

Wu, W., Black, M. J., Gao, Y., Bienenstock, E., Serruya, M., Donoghue, J. P.

In SAB’02-Workshop on Motor Control in Humans and Robots: On the Interplay of Real Brains and Artificial Devices, pages: 66-73, Edinburgh, Scotland (UK), August 2002 (inproceedings)

pdf [BibTex]

2002

pdf [BibTex]


no image
Inferring hand motion from multi-cell recordings in motor cortex using a Kalman filter

Wu, W., Black M., Gao, Y., Bienenstock, E., Serruya, M., Donoghue, J.

Program No. 357.5. 2002 Abstract Viewer/Itinerary Planner, Society for Neuroscience, Washington, DC, 2002, Online (conference)

abstract [BibTex]

abstract [BibTex]


Probabilistic inference of hand motion from neural activity in motor cortex
Probabilistic inference of hand motion from neural activity in motor cortex

Gao, Y., Black, M. J., Bienenstock, E., Shoham, S., Donoghue, J.

In Advances in Neural Information Processing Systems 14, pages: 221-228, MIT Press, 2002 (inproceedings)

Abstract
Statistical learning and probabilistic inference techniques are used to infer the hand position of a subject from multi-electrode recordings of neural activity in motor cortex. First, an array of electrodes provides train- ing data of neural firing conditioned on hand kinematics. We learn a non- parametric representation of this firing activity using a Bayesian model and rigorously compare it with previous models using cross-validation. Second, we infer a posterior probability distribution over hand motion conditioned on a sequence of neural test data using Bayesian inference. The learned firing models of multiple cells are used to define a non- Gaussian likelihood term which is combined with a prior probability for the kinematics. A particle filtering method is used to represent, update, and propagate the posterior distribution over time. The approach is com- pared with traditional linear filtering methods; the results suggest that it may be appropriate for neural prosthetic applications.

pdf [BibTex]

pdf [BibTex]


Automatic detection and tracking of human motion with a view-based representation
Automatic detection and tracking of human motion with a view-based representation

Fablet, R., Black, M. J.

In European Conf. on Computer Vision, ECCV 2002, 1, pages: 476-491, LNCS 2353, (Editors: A. Heyden and G. Sparr and M. Nielsen and P. Johansen), Springer-Verlag , 2002 (inproceedings)

Abstract
This paper proposes a solution for the automatic detection and tracking of human motion in image sequences. Due to the complexity of the human body and its motion, automatic detection of 3D human motion remains an open, and important, problem. Existing approaches for automatic detection and tracking focus on 2D cues and typically exploit object appearance (color distribution, shape) or knowledge of a static background. In contrast, we exploit 2D optical flow information which provides rich descriptive cues, while being independent of object and background appearance. To represent the optical flow patterns of people from arbitrary viewpoints, we develop a novel representation of human motion using low-dimensional spatio-temporal models that are learned using motion capture data of human subjects. In addition to human motion (the foreground) we probabilistically model the motion of generic scenes (the background); these statistical models are defined as Gibbsian fields specified from the first-order derivatives of motion observations. Detection and tracking are posed in a principled Bayesian framework which involves the computation of a posterior probability distribution over the model parameters (i.e., the location and the type of the human motion) given a sequence of optical flow observations. Particle filtering is used to represent and predict this non-Gaussian posterior distribution over time. The model parameters of samples from this distribution are related to the pose parameters of a 3D articulated model (e.g. the approximate joint angles and movement direction). Thus the approach proves suitable for initializing more complex probabilistic models of human motion. As shown by experiments on real image sequences, our method is able to detect and track people under different viewpoints with complex backgrounds.

pdf [BibTex]

pdf [BibTex]


A layered motion representation with occlusion and compact spatial support
A layered motion representation with occlusion and compact spatial support

Fleet, D. J., Jepson, A., Black, M. J.

In European Conf. on Computer Vision, ECCV 2002, 1, pages: 692-706, LNCS 2353, (Editors: A. Heyden and G. Sparr and M. Nielsen and P. Johansen), Springer-Verlag , 2002 (inproceedings)

Abstract
We describe a 2.5D layered representation for visual motion analysis. The representation provides a global interpretation of image motion in terms of several spatially localized foreground regions along with a background region. Each of these regions comprises a parametric shape model and a parametric motion model. The representation also contains depth ordering so visibility and occlusion are rightly included in the estimation of the model parameters. Finally, because the number of objects, their positions, shapes and sizes, and their relative depths are all unknown, initial models are drawn from a proposal distribution, and then compared using a penalized likelihood criterion. This allows us to automatically initialize new models, and to compare different depth orderings.

pdf [BibTex]

pdf [BibTex]


Implicit probabilistic models of human motion for synthesis and tracking
Implicit probabilistic models of human motion for synthesis and tracking

Sidenbladh, H., Black, M. J., Sigal, L.

In European Conf. on Computer Vision, 1, pages: 784-800, 2002 (inproceedings)

Abstract
This paper addresses the problem of probabilistically modeling 3D human motion for synthesis and tracking. Given the high dimensional nature of human motion, learning an explicit probabilistic model from available training data is currently impractical. Instead we exploit methods from texture synthesis that treat images as representing an implicit empirical distribution. These methods replace the problem of representing the probability of a texture pattern with that of searching the training data for similar instances of that pattern. We extend this idea to temporal data representing 3D human motion with a large database of example motions. To make the method useful in practice, we must address the problem of efficient search in a large training set; efficiency is particularly important for tracking. Towards that end, we learn a low dimensional linear model of human motion that is used to structure the example motion database into a binary tree. An approximate probabilistic tree search method exploits the coefficients of this low-dimensional representation and runs in sub-linear time. This probabilistic tree search returns a particular sample human motion with probability approximating the true distribution of human motions in the database. This sampling method is suitable for use with particle filtering techniques and is applied to articulated 3D tracking of humans within a Bayesian framework. Successful tracking results are presented, along with examples of synthesizing human motion using the model.

pdf [BibTex]

pdf [BibTex]


Robust parameterized component analysis: Theory and applications to {2D} facial modeling
Robust parameterized component analysis: Theory and applications to 2D facial modeling

De la Torre, F., Black, M. J.

In European Conf. on Computer Vision, ECCV 2002, 4, pages: 653-669, LNCS 2353, Springer-Verlag, 2002 (inproceedings)

pdf [BibTex]

pdf [BibTex]

2001


Dynamic coupled component analysis
Dynamic coupled component analysis

De la Torre, F., Black, M. J.

In IEEE Proc. Computer Vision and Pattern Recognition, CVPR’01, 2, pages: 643-650, IEEE, Kauai, Hawaii, December 2001 (inproceedings)

pdf [BibTex]

2001

pdf [BibTex]


Robust principal component analysis for computer vision
Robust principal component analysis for computer vision

De la Torre, F., Black, M. J.

In Int. Conf. on Computer Vision, ICCV-2001, II, pages: 362-369, Vancouver, BC, USA, 2001 (inproceedings)

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Learning image statistics for {Bayesian} tracking
Learning image statistics for Bayesian tracking

Sidenbladh, H., Black, M. J.

In Int. Conf. on Computer Vision, ICCV-2001, II, pages: 709-716, Vancouver, BC, USA, 2001 (inproceedings)

pdf [BibTex]

pdf [BibTex]


no image
Encoding/decoding of arm kinematics from simultaneously recorded MI neurons

Gao, Y., Bienenstock, E., Black, M., Shoham, S., Serruya, M., Donoghue, J.

Society for Neuroscience Abst. Vol. 27, Program No. 572.14, 2001 (conference)

abstract [BibTex]

abstract [BibTex]


Learning and tracking cyclic human motion
Learning and tracking cyclic human motion

Ormoneit, D., Sidenbladh, H., Black, M. J., Hastie, T.

In Advances in Neural Information Processing Systems 13, NIPS, pages: 894-900, (Editors: Leen, Todd K. and Dietterich, Thomas G. and Tresp, Volker), The MIT Press, 2001 (inproceedings)

pdf [BibTex]

pdf [BibTex]

1997


Robust anisotropic diffusion and sharpening of scalar and vector images
Robust anisotropic diffusion and sharpening of scalar and vector images

Black, M. J., Sapiro, G., Marimont, D., Heeger, D.

In Int. Conf. on Image Processing, ICIP, 1, pages: 263-266, Vol. 1, Santa Barbara, CA, October 1997 (inproceedings)

Abstract
Relations between anisotropic diffusion and robust statistics are described. We show that anisotropic diffusion can be seen as a robust estimation procedure that estimates a piecewise smooth image from a noisy input image. The "edge-stopping" function in the anisotropic diffusion equation is closely related to the error norm and influence function in the robust estimation framework. This connection leads to a new "edge-stopping" function based on Tukey's biweight robust estimator, that preserves sharper boundaries than previous formulations and improves the automatic stopping of the diffusion. The robust statistical interpretation also provides a means for detecting the boundaries (edges) between the piecewise smooth regions in the image. We extend the framework to vector-valued images and show applications to robust image sharpening.

pdf publisher site [BibTex]

1997

pdf publisher site [BibTex]


Robust anisotropic diffusion: Connections between robust statistics, line processing, and anisotropic diffusion
Robust anisotropic diffusion: Connections between robust statistics, line processing, and anisotropic diffusion

Black, M. J., Sapiro, G., Marimont, D., Heeger, D.

In Scale-Space Theory in Computer Vision, Scale-Space’97, pages: 323-326, LNCS 1252, Springer Verlag, Utrecht, the Netherlands, July 1997 (inproceedings)

pdf [BibTex]

pdf [BibTex]


Learning parameterized models of image motion
Learning parameterized models of image motion

Black, M. J., Yacoob, Y., Jepson, A. D., Fleet, D. J.

In IEEE Conf. on Computer Vision and Pattern Recognition, CVPR-97, pages: 561-567, Puerto Rico, June 1997 (inproceedings)

Abstract
A framework for learning parameterized models of optical flow from image sequences is presented. A class of motions is represented by a set of orthogonal basis flow fields that are computed from a training set using principal component analysis. Many complex image motions can be represented by a linear combination of a small number of these basis flows. The learned motion models may be used for optical flow estimation and for model-based recognition. For optical flow estimation we describe a robust, multi-resolution scheme for directly computing the parameters of the learned flow models from image derivatives. As examples we consider learning motion discontinuities, non-rigid motion of human mouths, and articulated human motion.

pdf [BibTex]

pdf [BibTex]


Analysis of gesture and action in technical talks for video indexing
Analysis of gesture and action in technical talks for video indexing

Ju, S. X., Black, M. J., Minneman, S., Kimber, D.

In IEEE Conf. on Computer Vision and Pattern Recognition, pages: 595-601, CVPR-97, Puerto Rico, June 1997 (inproceedings)

Abstract
In this paper, we present an automatic system for analyzing and annotating video sequences of technical talks. Our method uses a robust motion estimation technique to detect key frames and segment the video sequence into subsequences containing a single overhead slide. The subsequences are stabilized to remove motion that occurs when the speaker adjusts their slides. Any changes remaining between frames in the stabilized sequences may be due to speaker gestures such as pointing or writing and we use active contours to automatically track these potential gestures. Given the constrained domain we define a simple ``vocabulary'' of actions which can easily be recognized based on the active contour shape and motion. The recognized actions provide a rich annotation of the sequence that can be used to access a condensed version of the talk from a web page.

pdf [BibTex]

pdf [BibTex]


Modeling appearance change in image sequences
Modeling appearance change in image sequences

Black, M. J., Yacoob, Y., Fleet, D. J.

In Advances in Visual Form Analysis, pages: 11-20, Proceedings of the Third International Workshop on Visual Form, Capri, Italy, May 1997 (inproceedings)

abstract [BibTex]

abstract [BibTex]

1991


Dynamic motion estimation and feature extraction over long image sequences
Dynamic motion estimation and feature extraction over long image sequences

Black, M. J., Anandan, P.

In Proc. IJCAI Workshop on Dynamic Scene Understanding, Sydney, Australia, August 1991 (inproceedings)

[BibTex]

1991

[BibTex]


Robust dynamic motion estimation over time
Robust dynamic motion estimation over time

(IEEE Computer Society Outstanding Paper Award)

Black, M. J., Anandan, P.

In Proc. Computer Vision and Pattern Recognition, CVPR-91,, pages: 296-302, Maui, Hawaii, June 1991 (inproceedings)

Abstract
This paper presents a novel approach to incrementally estimating visual motion over a sequence of images. We start by formulating constraints on image motion to account for the possibility of multiple motions. This is achieved by exploiting the notions of weak continuity and robust statistics in the formulation of the minimization problem. The resulting objective function is non-convex. Traditional stochastic relaxation techniques for minimizing such functions prove inappropriate for the task. We present a highly parallel incremental stochastic minimization algorithm which has a number of advantages over previous approaches. The incremental nature of the scheme makes it truly dynamic and permits the detection of occlusion and disocclusion boundaries.

pdf video abstract [BibTex]

pdf video abstract [BibTex]