Michael J. Black received his B.Sc. from the University of British Columbia (1985), his M.S. from Stanford (1989), and his Ph.D. in computer science from Yale University (1992). After research at NASA Ames and post-doctoral research at the University of Toronto, he joined the Xerox Palo Alto Research Center in 1993 where he later managed the Image Understanding Area and founded the Digital Video Analysis group. From 2000 to 2010 he was on the faculty of Brown University in the Department of Computer Science (Assoc. Prof. 2000-2004, Prof. 2004-2010). He is a founding director at the Max Planck Institute for Intelligent Systems in Tübingen, Germany, where he leads the Perceiving Systems department. He is an honorary professor at the University of Tübingen, a visiting professor at ETH Zürich, and an adjunct professor (research) at Brown University.
Black is a foreign member of the Royal Swedish Academy of Sciences. He is a recipient of the 2010 Koenderink Prize for Fundamental Contributions in Computer Vision and the 2013 Helmholtz Prize for work that has stood the test of time. His work has won several paper awards including the IEEE Computer Society Outstanding Paper Award (CVPR'91). His work received Honorable Mention for the Marr Prize in 1999 and 2005. His early work on optical flow has been widely used in Hollywood films including for the Academy-Award-winning effects in “What Dreams May Come” and “The Matrix Reloaded.” He has contributed to several influential datasets including the Middlebury Flow dataset, HumanEva, and the Sintel dataset. He is a co-founder, science advisor, and member of the board of directors of Body Labs Inc., which is commercializing his team’s research on 3D human body shape.
Prof. Black's research interests in machine vision include optical flow estimation, 3D shape models, human shape and motion analysis, robust statistical methods, and probabilistic models of the visual world. In computational neuroscience his work focuses on probabilistic models of the neural code and applications of neural decoding in neural prosthetics.
Michael Black received his B.Sc. from the University of British Columbia (1985), his M.S. from Stanford (1989), and his Ph.D. from Yale University (1992). After post-doctoral research at the University of Toronto, he worked at Xerox PARC as a member of research staff and an area manager. From 2000 to 2010 he was on the faculty of Brown University in the Department of Computer Science (Assoc. Prof. 2000-2004, Prof. 2004-2010). He is one of the founding directors at the Max Planck Institute for Intelligent Systems in Tübingen, Germany, where he leads the Perceiving Systems department. He is an Honorarprofessor at the University of Tuebingen, Visiting Professor at ETH Zürich, and Adjunct Professor (Research) at Brown University. His work has won several awards including the IEEE Computer Society Outstanding Paper Award (1991), Honorable Mention for the Marr Prize (1999 and 2005), the 2010 Koenderink Prize for Fundamental Contributions in Computer Vision, and the 2013 Helmholtz Prize for work that has stood the test of time. He is a foreign member of the Royal Swedish Academy of Sciences. He is also a co-founder, science advisor, and board member of Body Labs Inc.
Royal Swedish Academy of Sciences
Foreign member, Class for Engineering Sciences, since June 2015.
for the paper: Black, M. J., and Anandan, P., "A framework for the robust estimation of optical flow,'' IEEE International Conference on Computer Vision, ICCV, pages 231-236, Berlin, Germany. May 1993.
2010Koenderink Prize for Fundamental Contributions in Computer Vision,
with Sidenbladh, H. and Fleet, D. J. for the paper "Stochastic tracking of 3D human figures using 2D image motion,'' European Conference on Computer Vision, 2000.
"Dataset Award" at the Eurographics Symposium on Geometry Processing 2016, with F. Bogo, J. Romero, and M. Loper, for the paper "FAUST: Dataset and evaluation for 3D mesh registration," CVPR 2014.
Best Paper Award, International Conference on 3D Vision (3DV), 2015, with A. O. Ulusoy and A. Geiger, for the paper "Towards Probabilistic Volumetric Reconstruction using Ray Potentials."
Best Paper Award, INI-Graphics Net, 2008, First Prize Winner of Category Research,
with S. Roth for the paper "Steerable random fields."
Best Paper Award, Fourth International Conference on Articulated Motion and Deformable Objects (AMDO-e 2006), with L. Sigal for the paper "Predicting 3D people from 2D pictures.''
Marr Prize, Honorable Mention, Int. Conf. on Computer Vision, ICCV-2005, Beijing, China, Oct. 2005 with S. Roth for the paper "On the spatial statistics of optical flow.''
Marr Prize, Honorable Mention, Int. Conf. on Computer Vision, ICCV-99, Corfu, Greece, Sept. 1999 with D. J. Fleet for the paper "Probabilistic detection and tracking of motion discontinuities.''
IEEE Computer Society, Outstanding Paper Award, Conference on Computer Vision and Pattern Recognition, Maui, Hawaii, June 1991 with P. Anandan for the paper "Robust dynamic motion estimation over time.''
Commendation and Chief's Award, Henrico County Division of Police,
County of Henrico, Virginia, April 19, 2007.
University of Maryland, Invention of the Year, 1995, "Tracking and Recognizing Facial Expressions,'' with Y. Yacoob.
University of Toronto, Computer Science Students' Union Teaching Award for 1992-1993.
My research addressed the problem of estimating and explaining motion in image sequences. I developed methods detecting and tracking 2D and 3D human motion including the introduction of particle filtering for 3D human tracking and belief propagation for 3D human pose estimation. I worked on probabilistic models of images include the high-order Field of Experts model. I worked on 3D human shape estimation from images and video and developed applications of this technology. I also developed mathematical models for decoding neural signals. This included the first uses of particle filtering and Kalman filtering for decoding motor cortical neural activity and the first point-and-click cortical neural brain-machine-interface for people with paralysis.
Research included modeling image changes (motion, illumination, specularity, occlusion, etc.) in video as a mixture of causes. I developed methods of motion explanation; that is, the extraction of mid-level or high-level concepts from motion.This included the modeling and recognition of motion "features" (occlusion boundaries, moving bars, etc.), human facial expressions and gestures, and motion "texture" (plants, fire, water, etc.). I applied these methods to problems in video indexing, motion for video annotation, teleconferencing, and gestural user interfaces. Other research included robust learning of image-based models, regularization with transparency, anisotropic diffusion, and the recovery of multiple shapes from transparent textures.
Research included the application of mixture models to optical flow, detection and tracking of surface discontinuities using motion information, and robust surface recovery in dynamic environments.
Yale University, (9/89-8/92) New Haven, CT
Research Assistant, Department of Computer Science.
Research in the recovery of optical flow, incremental estimation, temporal continuity, applications of robust statistics to optical flow, the relationship between robust statistics and line processes, the early detection of motion discontinuities, and the role of representation in computer vision.
Developed motion estimation algorithms in the context of an autonomous Mars landing and nap-of-the-earth helicopter flight and studied the psychophysical implications of a temporal continuity assumption.
Research on spatial reasoning for robotic vehicle route planning and terrain analysis. Vision research including perceptual grouping, object-based translational motion processing, the integration of vision and control for an autonomous vehicle, object modeling using generalized cylinders, and the development of an object-oriented vision environment.
GTE Government Systems, (6/85-12/86) Mountain View, CA
Engineer, Artificial Intelligence Group.
Developed expert systems for multi-source data fusion and fault location.
Summer undergraduate researcher at UBC; park ranger's assistant; volunteer firefighter, busboy; and probably my worst job: cleaning dog kennels.
I am interested in motion. What does motion tell us about the structure of the world and how can we compute this from video? How do humans and animals move? How does the brain control complex movement? My work combines computer vision, graphics and neuroscience to develop new models and algorithms to capture and analyze the motion of the world.
My Computer Vision research addresses:
the estimation of scene structure and physical properties from video;
modeling the neural control of reaching and grasping;
novel neural decoding algorithms;
neural prostheses and cortical brain-machine interfaces;
markless animal motion capture.
What is maybe unique about my work is the combination of the these themes. For example I study human motion from the inside (decoding neural activity in paralyzed humans) and the outside (with novel motion capture techniques).
Frank Wood, Associate Professor, Department of Engineering, Oxford
Thesis: Nonparametric Bayesian modeling of neural data. Department of Computer Science, Brown University
Hulya Yalcin, Assistant Professor, Department of Electronics and Communications Engineering, Istanbul Technical University, Turkey
Thesis: Implicit models of moving and static surfaces, Division of Engineering, Brown University, May 2004
Wei Wu, Associate Professor, Dept. of Statistics, Florida State
Thesis: Statistical models of neural coding in motor cortex, Division of Applied Math, Brown University. Co-supervised with David Mumford. May 2004.
Fernando De la Torre, Research Associate Professor, CMU and Facebook,
Thesis: Robust subspace learning for computer vision, La Salle School of Engineering. Universitat Ramon Llull, Barcelona, Spain. Jan. 2002
My old Brown site has several image sequences used in my older publications. These include some classic sequences such as Yosemite, the Pepsi can, the SRI tree sequence, and the Flower Garden sequence.
A Quantitative Analysis of Current Practices in Optical Flow Estimation and the Principles behind Them
Sun, D., Roth, S., and Black, M.J. International Journal of Computer Vision (IJCV), 106(2):115-137, 2014. (pdf)
Secrets of optical flow estimation and their principles
Sun, D., Roth, S., and Black, M. J., IEEE Conf. on Computer Vision and Pattern Recog., CVPR, June 2010. (pdf)
This method implements many of the currently best known techniques for accurate optical flow and was once ranked #1 on the Middlebury evaluation (June 2010).
The software is made available for research pupropses. Please read the copyright statement and contact me for commerical licensing.
2. Matlab implmentation of the Black and Anandan dense optical flow method
The Matlab flow code is easier to use and more accurate than the original C code. The objective function being optimized is the same but the Matlab version uses more modern optimization methods:
The method in 1 above is more accurate and also implements Black and Anandan plus much more.
3. Original Black and Anandan method implemented in C
The optical flow software here has been used by a number of graphics companies to make special effects for movies. This software is provided for research purposes only; any sale or use for commercial purposes is strictly prohibited.
Contact me for the password to download the software, stating that it is for research purposes.
Please contact me if you wish to use this code for commercial purpose.
If you are a commercial enterprise and would like assistance in using optical flow in your application, please contact me at my consulting address firstname.lastname@example.org.
This is EXPERIMENTAL software. It is provided to illustrate some ideas in the robust estimation of optical flow. Use at your own risk. No warranty is implied by this distribution.
The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields,
Black, M. J. and Anandan, P., Computer Vision and Image Understanding, CVIU, 63(1), pp. 75-104, Jan. 1996. (pdf),(pdf from publisher)
Robust Principal Component Analysis (PCA)
Software is from the ICCV'2001 paper with Fernando De la Torre.
The code below provides a simple Matlab implementation of the Bayesian 3D person tracking system described in ECCV'00 and ICCV'01. It is too slow to be used to track the entire body but can be used to track various limbs and provides a basis for people who want to understand the methods better and extend them.
Stochastic tracking of 3D human figures using 2D image motion,
Sidenbladh, H., Black, M. J., and Fleet, D.J., European Conference on Computer Vision, D. Vernon (Ed.), Springer Verlag, LNCS 1843, Dublin, Ireland, pp. 702-718 June 2000. (postscript)(pdf), (abstract)
Software. (Note: if you uncompress and untar this on a PC using Winzip, the path names may be lost which will cause Matlab to fail when you load the .mat files. Instead uncompress/untar using gunzip and tar.)
In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 1378 -1385, Columbus, Ohio, USA, June 2014 (inproceedings)
We consider the intersection of two research fields: transfer learning and statistics on manifolds. In particular, we consider, for manifold-valued data, transfer learning of tangent-space models such as Gaussians distributions, PCA, regression, or classifiers. Though one would hope to simply use ordinary Rn-transfer learning ideas, the manifold structure prevents it. We overcome this by basing our method on inner-product-preserving parallel transport, a well-known tool widely used in other problems of statistics on manifolds in computer vision. At first, this straightforward idea seems to suffer from an obvious shortcoming: Transporting large datasets is prohibitively expensive, hindering scalability. Fortunately, with our approach, we never transport data. Rather, we show how the statistical models themselves can be transported, and prove that for the tangent-space models above, the transport “commutes” with learning. Consequently, our compact framework, applicable to a large class of manifolds, is not restricted by the size of either the training or test sets. We demonstrate the approach by transferring PCA and logistic-regression models of real-world data involving 3D shapes and image descriptors.
In Proceedings Winter Conference on Applications of Computer Vision, pages: 83-90, IEEE , March 2014 (inproceedings)
Extracting anthropometric or tailoring measurements from 3D human body scans is important for applications such as virtual try-on, custom clothing, and online sizing. Existing commercial solutions identify anatomical landmarks on high-resolution 3D scans and then compute distances or circumferences on the scan. Landmark detection is sensitive to acquisition noise (e.g. holes) and these methods require subjects to adopt a specific pose. In contrast, we propose a solution we call model-based anthropometry. We fit a deformable 3D body model to scan data in one or more poses; this model-based fitting is robust to scan noise. This brings the scan into registration with a database of registered body scans. Then, we extract features from the registered model (rather than from the scan); these include, limb lengths, circumferences, and statistical features of global shape. Finally, we learn a mapping from these features to measurements using regularized linear regression. We perform an extensive evaluation using the CAESAR dataset and demonstrate that the accuracy of our method outperforms state-of-the-art methods.
Homer, M., Perge, J., Black, M. J., Harrison, M., Cash, S., Hochberg, L.
IEEE Transactions on Neural Systems and Rehabilitation Engineering, 22(2):239-248, March 2014 (article)
Intracortical brain computer interfaces (iBCIs) decode intended movement from neural activity for the control of external devices such as a robotic arm. Standard approaches include a calibration phase to estimate decoding parameters. During iBCI operation, the statistical properties of the neural activity can depart from those observed during calibration, sometimes hindering a user’s ability to control the iBCI. To address this problem, we adaptively correct the offset terms within a Kalman filter decoder via penalized maximum likelihood estimation. The approach can handle rapid shifts in neural signal behavior (on the order of seconds) and requires no knowledge of the intended movement. The algorithm, called MOCA, was tested using simulated neural activity and evaluated retrospectively using data collected from two people with tetraplegia operating an iBCI. In 19 clinical research test cases, where a nonadaptive Kalman filter yielded relatively high decoding errors, MOCA significantly reduced these errors (10.6 ± 10.1\%; p < 0.05, pairwise t-test). MOCA did not significantly change the error in the remaining 23 cases where a nonadaptive Kalman filter already performed well. These results suggest that MOCA provides more robust decoding than the standard Kalman filter for iBCIs.
International Journal of Computer Vision (IJCV), 106(2):115-137, 2014 (article)
The accuracy of optical flow estimation algorithms has been improving steadily as evidenced by results on the Middlebury optical flow benchmark. The typical formulation, however, has changed little since the work of Horn and Schunck. We attempt to uncover what has made recent advances possible through a thorough analysis of how the objective function, the optimization method, and modern implementation practices influence accuracy. We discover that "classical'' flow formulations perform surprisingly well when combined with modern optimization and implementation techniques. One key implementation detail is the median filtering of intermediate flow fields during optimization. While this improves the robustness of classical methods it actually leads to higher energy solutions, meaning that these methods are not optimizing the original objective function. To understand the principles behind this phenomenon, we derive a new objective function that formalizes the median filtering heuristic. This objective function includes a non-local smoothness term that robustly integrates flow estimates over large spatial neighborhoods. By modifying this new term to include information about flow and image boundaries we develop a method that can better preserve motion details. To take advantage of the trend towards video in wide-screen format, we further introduce an asymmetric pyramid downsampling scheme that enables the estimation of longer range horizontal motions. The methods are evaluated on Middlebury, MPI Sintel, and KITTI datasets using the same parameter settings.
(7), Max Planck Institute for Intelligent Systems, October 2013 (techreport)
We introduce Puppet Flow (PF), a layered model describing the optical flow of a person in a video sequence. We consider video frames composed by two layers: a foreground layer corresponding to a person, and background.
We model the background as an affine flow field. The foreground layer, being a moving person, requires reasoning about the articulated nature of the human body. We thus represent the foreground layer with the Deformable Structures model (DS), a parametrized 2D part-based human body representation. We call the motion field defined through articulated motion and deformation of the DS model, a Puppet Flow. By exploiting the DS representation, Puppet Flow is a parametrized optical flow field, where parameters are the person's pose, gender and body shape.
Systems, methods, and computer-readable storage media for simulating realistic clothing. The system generates a clothing deformation model for a clothing type, wherein the clothing deformation model factors a change of clothing shape due to rigid limb rotation, pose-independent body shape, and pose-dependent deformations. Next, the system generates a custom-shaped garment for a given body by mapping, via the clothing deformation model, body shape parameters to clothing shape parameters. The system then automatically dresses the given body with the custom- shaped garment.
In IEEE International Conference on Computer Vision (ICCV), pages: 3192-3199, IEEE, Sydney, Australia, December 2013 (inproceedings)
Although action recognition in videos is widely studied, current methods often fail on real-world datasets. Many recent approaches improve accuracy and robustness to cope with challenging video sequences, but it is often unclear
what affects the results most. This paper attempts to provide insights based on a systematic performance evaluation
using thoroughly-annotated data of human actions. We annotate human Joints for the HMDB dataset (J-HMDB). This annotation can be used to derive ground truth optical flow and segmentation. We evaluate current methods using
this dataset and systematically replace the output of various algorithms with ground truth. This enables us to discover what is important – for example, should we work on improving flow algorithms, estimating human bounding boxes, or enabling pose estimation? In summary, we find that highlevel pose features greatly outperform low/mid level features; in particular, pose over time is critical, but current pose estimation algorithms are not yet reliable enough to provide this information. We also find that the accuracy of a top-performing action recognition framework can be greatly increased by refining the underlying low/mid level features; this suggests it is important to improve optical flow and human detection algorithms. Our analysis and JHMDB dataset should facilitate a deeper understanding of action recognition algorithms.
Homer, M., Harrison, M., Black, M. J., Perge, J., Cash, S., Friehs, G., Hochberg, L.
In 6th International IEEE EMBS Conference on Neural Engineering, pages: 715-718, San Diego, November 2013 (inproceedings)
Kalman filtering is a common method to decode neural signals from the motor cortex. In clinical research investigating the use of intracortical brain computer interfaces (iBCIs), the technique enabled people with tetraplegia to control assistive devices such as a computer or robotic arm directly from their neural activity. For reaching movements, the Kalman filter typically estimates the instantaneous endpoint velocity of the control device. Here, we analyzed attempted arm/hand movements by people with tetraplegia to control a cursor on a computer screen to reach several circular targets. A standard velocity Kalman filter is enhanced to additionally decode for the cursor’s position. We then mix decoded velocity and position to generate cursor movement commands. We analyzed data, offline, from two participants across six sessions. Root mean squared error between the actual and estimated
cursor trajectory improved by 12.2 ±10.5% (pairwise t-test, p<0.05) as compared to a standard velocity Kalman filter. The findings suggest that simultaneously decoding for intended velocity and position and using them both to generate movement commands can improve the performance of iBCIs.
In IEEE Conf. on Computer Vision and Pattern Recognition, (CVPR 2013), pages: 2451-2458, Portland, OR, June 2013 (inproceedings)
Layered models allow scene segmentation and motion estimation to be formulated together and to inform one another. Traditional layered motion methods, however, employ fairly weak models of scene structure, relying on locally connected Ising/Potts models which have limited ability to capture long-range correlations in natural scenes. To address this, we formulate a fully-connected layered model that enables global reasoning about the complicated segmentations of real objects. Optimization with fully-connected graphical models is challenging, and our inference algorithm leverages recent work on efficient mean field updates for fully-connected conditional random fields. These methods can be implemented efficiently using high-dimensional Gaussian filtering. We combine these ideas with a layered flow model, and find that the long-range connections greatly improve segmentation into figure-ground layers when compared with locally connected MRF models. Experiments on several benchmark datasets show that the method can recover fine structures and large occlusion regions, with good flow accuracy and much lower computational cost than previous locally-connected layered models.
Faces and bodies are complex structures, perception of which can play important roles in person identification and inference of emotional state. Face representations have been explored using behavioural adaptation: in particular, studies have shown that face aftereffects show relatively broad tuning for viewpoint, consistent with origin in a high-level structural descriptor far removed from the retinal image. Our goals were to determine first, if body aftereffects also showed a degree of viewpoint invariance, and second if they also showed pose invariance, given that changes in pose create even more dramatic changes in the 2-D retinal image. We used a 3-D model of the human body to generate headless body images, whose parameters could be varied to generate different body forms, viewpoints, and poses. In the first experiment, subjects adapted to varying viewpoints of either slim or heavy bodies in a neutral stance, followed by test stimuli that were all front-facing. In the second experiment, we used the same front-facing bodies in neutral stance as test stimuli, but compared adaptation from bodies in the same neutral stance to adaptation with the same bodies in different poses. We found that body aftereffects were obtained over substantial viewpoint changes, with no significant decline in aftereffect magnitude with increasing viewpoint difference between adapting and test images. Aftereffects also showed transfer across one change in pose but not across another. We conclude that body representations may have more viewpoint invariance than faces, and demonstrate at least some transfer across pose, consistent with a high-level structural description.
Keywords: aftereffect, shape, face, representation
In Advances in Neural Information Processing Systems 25 (NIPS), pages: 2006-2014, (Editors: P. Bartlett and F.C.N. Pereira and C.J.C. Burges and L. Bottou and K.Q. Weinberger), MIT Press, 2012 (inproceedings)
We develop a method for discovering the parts of an articulated object from aligned meshes of the object in various three-dimensional poses. We adapt the distance dependent Chinese restaurant process (ddCRP) to allow nonparametric discovery of a potentially unbounded number of parts, while simultaneously guaranteeing a spatially connected segmentation. To allow analysis of datasets in which object instances have varying 3D shapes, we model part variability across poses via affine transformations. By placing a matrix normal-inverse-Wishart prior on these affine transformations, we develop a ddCRP Gibbs sampler which tractably marginalizes over transformation uncertainty. Analyzing a dataset of humans captured in dozens of poses, we infer parts which provide quantitatively better deformation predictions than conventional clustering methods.
In Consumer Depth Cameras for Computer Vision: Research Topics and Applications, pages: 99-118, 6, (Editors: Andrea Fossati and Juergen Gall and Helmut Grabner and Xiaofeng Ren and Kurt Konolige), Springer-Verlag, 2012 (incollection)
In European Conf. on Computer Vision (ECCV), pages: 242-255, LNCS 7577, Part IV, (Editors: A. Fitzgibbon et al. (Eds.)), Springer-Verlag, October 2012 (inproceedings)
Three-dimensional (3D) shape models are powerful because they enable the inference of object shape from incomplete, noisy, or ambiguous 2D or 3D data. For example, realistic parameterized 3D human body models have been used to infer the shape and pose of people from images. To train such models, a corpus of 3D body scans is typically brought into registration by aligning a common 3D human-shaped template to each scan. This is an ill-posed problem that typically involves solving an optimization problem with regularization terms that penalize
implausible deformations of the template. When aligning a corpus, however, we can do better than generic regularization. If we have a model of how the template can deform then alignments can be regularized by this
model. Constructing a model of deformations, however, requires having a corpus that is already registered. We address this chicken-and-egg problem by approaching modeling and registration together. By minimizing
a single objective function, we reliably obtain high quality registration of noisy, incomplete, laser scans, while simultaneously learning a highly realistic articulated body model. The model greatly improves robustness
to noise and missing data. Since the model explains a corpus of body scans, it captures how body shape varies across people and poses.
Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems