Michael J. Black received his B.Sc. from the University of British Columbia (1985), his M.S. from Stanford (1989), and his Ph.D. in computer science from Yale University (1992). After research at NASA Ames and post-doctoral research at the University of Toronto, he joined the Xerox Palo Alto Research Center in 1993 where he later managed the Image Understanding Area and founded the Digital Video Analysis group. From 2000 to 2010 he was on the faculty of Brown University in the Department of Computer Science (Assoc. Prof. 2000-2004, Prof. 2004-2010). He is a founding director at the Max Planck Institute for Intelligent Systems in Tübingen, Germany, where he leads the Perceiving Systems department. He is also a Distinguished Amazon Scholar, an Honorarprofessor at the University of Tuebingen, and Adjunct Professor at Brown University.
Black is a foreign member of the Royal Swedish Academy of Sciences. He is a recipient of the 2010 Koenderink Prize for Fundamental Contributions in Computer Vision and the 2013 Helmholtz Prize for work that has stood the test of time. His work has won several paper awards including the IEEE Computer Society Outstanding Paper Award (CVPR'91). His work received Honorable Mention for the Marr Prize in 1999 and 2005. His early work on optical flow has been widely used in Hollywood films including for the Academy-Award-winning effects in “What Dreams May Come” and “The Matrix Reloaded.” He has contributed to several influential datasets including the Middlebury Flow dataset, HumanEva, and the Sintel dataset. Black has coauthored over 200 peer-reviewed scientific publications.
He is also active in commercializing scientific results, is an inventor on 10 issued patents, and has advised multiple startups. He uniquely combines computer vision, graphics, and machine learning to solve problems in the clothing industry. In 2013, he co-founded Body Labs Inc., which used computer vision, machine learning, and graphics technology licensed from his lab to commercialize "the body as a digital platform." Body Labs was acquired by Amazon in 2017.
Black's research interests in machine vision include optical flow estimation, 3D shape models, human shape and motion analysis, robust statistical methods, and probabilistic models of the visual world. In computational neuroscience his work focuses on probabilistic models of the neural code and applications of neural decoding in neural prosthetics.
Michael Black received his B.Sc. from the University of British Columbia (1985), his M.S. from Stanford (1989), and his Ph.D. from Yale University (1992). After post-doctoral research at the University of Toronto, he worked at Xerox PARC as a member of research staff and area manager. From 2000 to 2010 he was on the faculty of Brown University in the Department of Computer Science (Assoc. Prof. 2000-2004, Prof. 2004-2010). He is one of the founding directors at the Max Planck Institute for Intelligent Systems in Tübingen, Germany, where he leads the Perceiving Systems department. He is also a Distinguished Amazon Scholar, an Honorarprofessor at the University of Tuebingen, and Adjunct Professor at Brown University. His work has won several awards including the IEEE Computer Society Outstanding Paper Award (1991), Honorable Mention for the Marr Prize (1999 and 2005), the 2010 Koenderink Prize for Fundamental Contributions in Computer Vision, and the 2013 Helmholtz Prize for work that has stood the test of time. He is a foreign member of the Royal Swedish Academy of Sciences. In 2013 he co-founded Body Labs Inc., which was acquired by Amazon in 2017.
Royal Swedish Academy of Sciences
Foreign member, Class for Engineering Sciences, since June 2015.
for the paper: Black, M. J., and Anandan, P., "A framework for the robust estimation of optical flow,'' IEEE International Conference on Computer Vision, ICCV, pages 231-236, Berlin, Germany. May 1993.
2010Koenderink Prize for Fundamental Contributions in Computer Vision,
with Sidenbladh, H. and Fleet, D. J. for the paper "Stochastic tracking of 3D human figures using 2D image motion,'' European Conference on Computer Vision, 2000.
Best Paper Award, Eurographics 2017, for the paper "Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs", by von Marcard, T., Rosenhahn, B., Black, M. J., Pons-Moll, G.
"Dataset Award" at the Eurographics Symposium on Geometry Processing 2016, with F. Bogo, J. Romero, and M. Loper, for the paper "FAUST: Dataset and evaluation for 3D mesh registration," CVPR 2014.
Best Paper Award, International Conference on 3D Vision (3DV), 2015, with A. O. Ulusoy and A. Geiger, for the paper "Towards Probabilistic Volumetric Reconstruction using Ray Potentials."
Best Paper Award, INI-Graphics Net, 2008, First Prize Winner of Category Research,
with S. Roth for the paper "Steerable random fields."
Best Paper Award, Fourth International Conference on Articulated Motion and Deformable Objects (AMDO-e 2006), with L. Sigal for the paper "Predicting 3D people from 2D pictures.''
Marr Prize, Honorable Mention, Int. Conf. on Computer Vision, ICCV-2005, Beijing, China, Oct. 2005 with S. Roth for the paper "On the spatial statistics of optical flow.''
Marr Prize, Honorable Mention, Int. Conf. on Computer Vision, ICCV-99, Corfu, Greece, Sept. 1999 with D. J. Fleet for the paper "Probabilistic detection and tracking of motion discontinuities.''
IEEE Computer Society, Outstanding Paper Award, Conference on Computer Vision and Pattern Recognition, Maui, Hawaii, June 1991 with P. Anandan for the paper "Robust dynamic motion estimation over time.''
Commendation and Chief's Award, Henrico County Division of Police,
County of Henrico, Virginia, April 19, 2007.
University of Maryland, Invention of the Year, 1995, "Tracking and Recognizing Facial Expressions,'' with Y. Yacoob.
University of Toronto, Computer Science Students' Union Teaching Award for 1992-1993.
My research addressed the problem of estimating and explaining motion in image sequences. I developed methods detecting and tracking 2D and 3D human motion including the introduction of particle filtering for 3D human tracking and belief propagation for 3D human pose estimation. I worked on probabilistic models of images include the high-order Field of Experts model. I worked on 3D human shape estimation from images and video and developed applications of this technology. I also developed mathematical models for decoding neural signals. This included the first uses of particle filtering and Kalman filtering for decoding motor cortical neural activity and the first point-and-click cortical neural brain-machine-interface for people with paralysis.
Research included modeling image changes (motion, illumination, specularity, occlusion, etc.) in video as a mixture of causes. I developed methods of motion explanation; that is, the extraction of mid-level or high-level concepts from motion.This included the modeling and recognition of motion "features" (occlusion boundaries, moving bars, etc.), human facial expressions and gestures, and motion "texture" (plants, fire, water, etc.). I applied these methods to problems in video indexing, motion for video annotation, teleconferencing, and gestural user interfaces. Other research included robust learning of image-based models, regularization with transparency, anisotropic diffusion, and the recovery of multiple shapes from transparent textures.
Research included the application of mixture models to optical flow, detection and tracking of surface discontinuities using motion information, and robust surface recovery in dynamic environments.
Yale University, (9/89-8/92) New Haven, CT
Research Assistant, Department of Computer Science.
Research in the recovery of optical flow, incremental estimation, temporal continuity, applications of robust statistics to optical flow, the relationship between robust statistics and line processes, the early detection of motion discontinuities, and the role of representation in computer vision.
Developed motion estimation algorithms in the context of an autonomous Mars landing and nap-of-the-earth helicopter flight and studied the psychophysical implications of a temporal continuity assumption.
Research on spatial reasoning for robotic vehicle route planning and terrain analysis. Vision research including perceptual grouping, object-based translational motion processing, the integration of vision and control for an autonomous vehicle, object modeling using generalized cylinders, and the development of an object-oriented vision environment.
GTE Government Systems, (6/85-12/86) Mountain View, CA
Engineer, Artificial Intelligence Group.
Developed expert systems for multi-source data fusion and fault location.
Summer undergraduate researcher at UBC; park ranger's assistant; volunteer firefighter, busboy; and probably my worst job: cleaning dog kennels.
I am interested in motion. What does motion tell us about the structure of the world and how can we compute this from video? How do humans and animals move? How does the brain control complex movement? My work combines computer vision, graphics and neuroscience to develop new models and algorithms to capture and analyze the motion of the world.
My Computer Vision research addresses:
the estimation of scene structure and physical properties from video;
modeling the neural control of reaching and grasping;
novel neural decoding algorithms;
neural prostheses and cortical brain-machine interfaces;
markless animal motion capture.
I also work on industrial applications in Fashion Science:
Body scanning and measurement;
cloth capture and modeling;
What is maybe unique about my work is the combination of the these themes. For example I study human motion from the inside (decoding neural activity in paralyzed humans) and the outside (with novel motion capture techniques).
Frank Wood, Associate Professor, Department of Engineering, Oxford
Thesis: Nonparametric Bayesian modeling of neural data. Department of Computer Science, Brown University
Hulya Yalcin, Assistant Professor, Department of Electronics and Communications Engineering, Istanbul Technical University, Turkey
Thesis: Implicit models of moving and static surfaces, Division of Engineering, Brown University, May 2004
Wei Wu, Associate Professor, Dept. of Statistics, Florida State
Thesis: Statistical models of neural coding in motor cortex, Division of Applied Math, Brown University. Co-supervised with David Mumford. May 2004.
Fernando De la Torre, Research Associate Professor, CMU and Facebook,
Thesis: Robust subspace learning for computer vision, La Salle School of Engineering. Universitat Ramon Llull, Barcelona, Spain. Jan. 2002
My old Brown site has several image sequences used in my older publications. These include some classic sequences such as Yosemite, the Pepsi can, the SRI tree sequence, and the Flower Garden sequence.
A Quantitative Analysis of Current Practices in Optical Flow Estimation and the Principles behind Them
Sun, D., Roth, S., and Black, M.J. International Journal of Computer Vision (IJCV), 106(2):115-137, 2014. (pdf)
Secrets of optical flow estimation and their principles
Sun, D., Roth, S., and Black, M. J., IEEE Conf. on Computer Vision and Pattern Recog., CVPR, June 2010. (pdf)
This method implements many of the currently best known techniques for accurate optical flow and was once ranked #1 on the Middlebury evaluation (June 2010).
The software is made available for research pupropses. Please read the copyright statement and contact me for commerical licensing.
2. Matlab implmentation of the Black and Anandan dense optical flow method
The Matlab flow code is easier to use and more accurate than the original C code. The objective function being optimized is the same but the Matlab version uses more modern optimization methods:
The method in 1 above is more accurate and also implements Black and Anandan plus much more.
3. Original Black and Anandan method implemented in C
The optical flow software here has been used by a number of graphics companies to make special effects for movies. This software is provided for research purposes only; any sale or use for commercial purposes is strictly prohibited.
Contact me for the password to download the software, stating that it is for research purposes.
Please contact me if you wish to use this code for commercial purpose.
If you are a commercial enterprise and would like assistance in using optical flow in your application, please contact me at my consulting address firstname.lastname@example.org.
This is EXPERIMENTAL software. It is provided to illustrate some ideas in the robust estimation of optical flow. Use at your own risk. No warranty is implied by this distribution.
The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields,
Black, M. J. and Anandan, P., Computer Vision and Image Understanding, CVIU, 63(1), pp. 75-104, Jan. 1996. (pdf),(pdf from publisher)
Robust Principal Component Analysis (PCA)
Software is from the ICCV'2001 paper with Fernando De la Torre.
The code below provides a simple Matlab implementation of the Bayesian 3D person tracking system described in ECCV'00 and ICCV'01. It is too slow to be used to track the entire body but can be used to track various limbs and provides a basis for people who want to understand the methods better and extend them.
Stochastic tracking of 3D human figures using 2D image motion,
Sidenbladh, H., Black, M. J., and Fleet, D.J., European Conference on Computer Vision, D. Vernon (Ed.), Springer Verlag, LNCS 1843, Dublin, Ireland, pp. 702-718 June 2000. (postscript)(pdf), (abstract)
Software. (Note: if you uncompress and untar this on a PC using Winzip, the path names may be lost which will cause Matlab to fail when you load the .mat files. Instead uncompress/untar using gunzip and tar.)
International Journal of Computer Vision, 54(1-3):117-142, August 2003 (article)
Many computer vision, signal processing and statistical problems can be posed as problems of learning low dimensional linear or multi-linear models. These models have been widely used for the representation of shape, appearance, motion, etc., in computer vision applications. Methods for learning linear models can be seen as a special case of subspace fitting. One draw-back of previous learning methods is that they are based on least squares estimation techniques and hence fail to account for “outliers” which are common in realistic training sets. We review previous approaches for making linear learning methods robust to outliers and present a new method that uses an intra-sample outlier process to account for pixel outliers. We develop the theory of Robust Subspace Learning (RSL) for linear models within a continuous optimization framework based on robust M-estimation. The framework applies to a variety of linear learning problems in computer vision including eigen-analysis and structure from motion. Several synthetic and natural examples are used to develop and illustrate the theory and applications of robust subspace learning in computer vision.
Computer Vision and Image Understanding, 91(1-2):53-71, July 2003 (article)
Principal component analysis (PCA) has been successfully applied to construct linear models of shape, graylevel, and motion in images. In particular, PCA has been widely used to model the variation in the appearance of people's faces. We extend previous work on facial modeling for tracking faces in video sequences as they undergo significant changes due to facial expressions. Here we consider person-specific facial appearance models (PSFAM), which use modular PCA to model complex intra-person appearance changes. Such models require aligned visual training data; in previous work, this has involved a time consuming and error-prone hand alignment and cropping process. Instead, the main contribution of this paper is to introduce parameterized component analysis to learn a subspace that is invariant to affine (or higher order) geometric transformations. The automatic learning of a PSFAM given a training image sequence is posed as a continuous optimization problem and is solved with a mixture of stochastic and deterministic techniques achieving sub-pixel accuracy. We illustrate the use of the 2D PSFAM model with preliminary experiments relevant to applications including video-conferencing and avatar animation.
In Exploring Artificial Intelligence in the New Millennium, pages: 139-174, (Editors: Lakemeyer, G. and Nebel, B.), Morgan Kaufmann Pub., July 2002 (incollection)
This chapter addresses an open problem in visual motion analysis, the estimation of image motion in the vicinity of occlusion boundaries. With a Bayesian formulation, local image motion is explained in terms of multiple, competing, nonlinear models, including models for smooth (translational) motion and for motion boundaries. The generative model for motion boundaries explicitly encodes the orientation of the boundary, the velocities on either side, the motion of the occluding edge over time, and the appearance/disappearance of pixels at the boundary. We formulate the posterior probability distribution over the models and model parameters, conditioned on the image sequence. Approximate inference is achieved with a combination of tools: A Bayesian filter provides for online computation; factored sampling allows us to represent multimodal non-Gaussian distributions and to propagate beliefs with nonlinear dynamics from one time to the next; and mixture models are used to simplify the computation of joint prediction distributions in the Bayesian filter. To efficiently represent such a high-dimensional space, we also initialize samples using the responses of a low-level motion-discontinuity detector. The basic formulation and computational model provide a general probabilistic framework for motion estimation with multiple, nonlinear models.
In European Conf. on Computer Vision, ECCV 2002, 1, pages: 692-706, LNCS 2353, (Editors: A. Heyden and G. Sparr and M. Nielsen and P. Johansen), Springer-Verlag , 2002 (inproceedings)
We describe a 2.5D layered representation for visual motion analysis. The representation provides a global interpretation of image motion in terms of several spatially localized foreground regions along with a background region. Each of these regions comprises a parametric shape model and a parametric motion model. The representation also contains depth ordering so visibility and occlusion are rightly included in the estimation of the model parameters. Finally, because the number of objects, their positions, shapes and sizes, and their relative depths are all unknown, initial models are drawn from a proposal distribution, and then compared using a penalized likelihood criterion. This allows us to automatically initialize new models, and to compare different depth orderings.
In European Conf. on Computer Vision, ECCV 2002, 1, pages: 476-491, LNCS 2353, (Editors: A. Heyden and G. Sparr and M. Nielsen and P. Johansen), Springer-Verlag , 2002 (inproceedings)
This paper proposes a solution for the automatic detection and tracking of human motion in image sequences. Due to the complexity of the human body and its motion, automatic detection of 3D human motion remains an open, and important, problem. Existing approaches for automatic detection and tracking focus on 2D cues and typically exploit object appearance (color distribution, shape) or knowledge of a static background. In contrast, we exploit 2D optical flow information which provides rich descriptive cues, while being independent of object and background appearance. To represent the optical flow patterns of people from arbitrary viewpoints, we develop a novel representation of human motion using low-dimensional spatio-temporal models that are learned using motion capture data of human subjects. In addition to human motion (the foreground) we probabilistically model the motion of generic scenes (the background); these statistical models are defined as Gibbsian fields specified from the first-order derivatives of motion observations. Detection and tracking are posed in a principled Bayesian framework which involves the computation of a posterior probability distribution over the model parameters (i.e., the location and the type of the human motion) given a sequence of optical flow observations. Particle filtering is used to represent and predict this non-Gaussian posterior distribution over time. The model parameters of samples from this distribution are related to the pose parameters of a 3D articulated model (e.g. the approximate joint angles and movement direction). Thus the approach proves suitable for initializing more complex probabilistic models of human motion. As shown by experiments on real image sequences, our method is able to detect and track people under different viewpoints with complex backgrounds.
In European Conf. on Computer Vision, 1, pages: 784-800, 2002 (inproceedings)
This paper addresses the problem of probabilistically modeling 3D human motion for synthesis and tracking. Given the high dimensional nature of human motion, learning an explicit probabilistic model from available training data is currently impractical. Instead we exploit methods from texture synthesis that treat images as representing an implicit empirical distribution. These methods replace the problem of representing the probability of a texture pattern with that of searching the training data for similar instances of that pattern. We extend this idea to temporal data representing 3D human motion with a large database of example motions. To make the method useful in practice, we must address the problem of efficient search in a large training set; efficiency is particularly important for tracking. Towards that end, we learn a low dimensional linear model of human motion that is used to structure the example motion database into a binary tree. An approximate probabilistic tree search method exploits the coefficients of this low-dimensional representation and runs in sub-linear time. This probabilistic tree search returns a particular sample human motion with probability approximating the true distribution of human motions in the database. This sampling method is suitable for use with particle filtering techniques and is applied to articulated 3D tracking of humans within a Bayesian framework. Successful tracking results are presented, along with examples of synthesizing human motion using the model.
Ormoneit, D., Sidenbladh, H., Black, M. J., Hastie, T.
Learning and tracking cyclic human motion
In Advances in Neural Information Processing Systems 13, NIPS, pages: 894-900, (Editors: Leen, Todd K. and Dietterich, Thomas G. and Tresp, Volker), The MIT Press, 2001 (inproceedings)
In In Proc. 5th World Congress of the Bernoulli Society for Probability and Mathematical Statistics and 63rd Annual Meeting of the Institute of Mathematical Statistics, Guanajuato, Mexico, May 2000 (inproceedings)
Ormoneit, D., Hastie, T., Black, M. J.Functional analysis of human motion data
In In Proc. 5th World Congress of the Bernoulli Society for Probability and Mathematical Statistics and 63rd Annual Meeting of the Institute of Mathematical Statistics, Guanajuato, Mexico, May 2000 (inproceedings)
In European Conference on Computer Vision, ECCV, pages: 702-718, LNCS 1843, Springer Verlag, Dublin, Ireland, June 2000 (inproceedings)
A probabilistic method for tracking 3D articulated human figures in monocular image sequences is presented. Within a Bayesian framework, we define a generative model of image appearance, a robust likelihood function based on image gray level differences, and a prior probability distribution over pose and joint angles that models how humans move. The posterior probability distribution over model parameters is represented using a discrete set of samples and is propagated over time using particle filtering. The approach extends previous work on parameterized optical flow estimation to exploit a complex 3D articulated motion model. It also extends previous work on human motion tracking by including a perspective camera model, by modeling limb self occlusion, and by recovering 3D motion from a monocular sequence. The explicit posterior probability distribution represents ambiguities due to image matching, model singularities, and perspective projection. The method relies only on a frame-to-frame assumption of brightness constancy and hence is able to track people under changing viewpoints, in grayscale image sequences, and with complex unknown backgrounds.
Computer Vision and Image Understanding, 78(1):8-31, 2000 (article)
We propose a generalized model of image “appearance change” in which brightness variation over time is represented as a probabilistic mixture of different causes. We define four generative models of appearance change due to (1) object or camera motion; (2) illumination phenomena; (3) specular reflections; and (4) “iconic changes” which are specific to the objects being viewed. These iconic changes include complex occlusion events and changes in the material properties of the objects. We develop a robust statistical framework for recovering these appearance changes in image sequences. This approach generalizes previous work on optical flow to provide a richer description of image events and more reliable estimates of image motion in the presence of shadows and specular reflections.
Int. J. of Computer Vision, 36(3):171-193, 2000 (article)
Linear parameterized models of optical flow, particularly affine models, have become widespread in image motion analysis. The linear model coefficients are straightforward to estimate, and they provide reliable estimates of the optical flow of smooth surfaces. Here we explore the use of parameterized motion models that represent much more varied and complex motions. Our goals are threefold: to construct linear bases for complex motion phenomena; to estimate the coefficients of these linear models; and to recognize or classify image motions from the estimated coefficients. We consider two broad classes of motions: i) generic “motion features” such as motion discontinuities and moving bars; and ii) non-rigid, object-specific, motions such as the motion of human mouths. For motion features we construct a basis of steerable flow fields that approximate the motion features. For object-specific motions we construct basis flow fields from example motions using principal component analysis. In both cases, the model coefficients can be estimated directly from spatiotemporal image derivatives with a robust, multi-resolution scheme. Finally, we show how these model coefficients can be use to detect and recognize specific motions such as occlusion boundaries and facial expressions.
Int. J. of Computer Vision, 38(3):231-245, July 2000 (article)
We propose a Bayesian framework for representing and recognizing local image motion in terms of two basic models: translational motion and motion boundaries. Motion boundaries are represented using a non-linear generative model that explicitly encodes the orientation of the boundary, the velocities on either side, the motion of the occluding edge over time, and the appearance/disappearance of pixels at the boundary. We represent the posterior probability distribution over the model parameters given the image data using discrete samples. This distribution is propagated over time using a particle filtering algorithm. To efficiently represent such a high-dimensional space we initialize samples using the responses of a low-level motion discontinuity detector. The formulation and computational model provide a general probabilistic framework for motion estimation with multiple, non-linear, models.
In Scale-Space Theories in Computer Vision, Second Int. Conf., Scale-Space ’99, pages: 259-270, LNCS 1682, Springer, Corfu, Greece, September 1999 (inproceedings)
Edges are viewed as statistical outliers with respect to local image gradient magnitudes. Within local image regions we compute a robust statistical measure of the gradient variation and use this in an anisotropic diffusion framework to determine a spatially varying "edge-stopping" parameter σ. We show how to determine this parameter for two edge-stopping functions described in the literature (Perona-Malik and the Tukey biweight). Smoothing of the image is related the local texture and in regions of low texture, small gradient values may be treated as edges whereas in regions of high texture, large gradient magnitudes are necessary before an edge is preserved. Intuitively these results have similarities with human perceptual phenomena such as masking and "popout". Results are shown on a variety of standard images.
Computer Vision and Image Understanding, 73(2):232-247, 1999 (article)
In this paper we consider a class of human activities—atomic activities—which can be represented as a set of measurements over a finite temporal window (e.g., the motion of human body parts during a walking cycle) and which has a relatively small space of variations in performance. A new approach for modeling and recognition of atomic activities that employs principal component analysis and analytical global transformations is proposed. The modeling of sets of exemplar instances of activities that are similar in duration and involve similar body part motions is achieved by parameterizing their representation using principal component analysis. The recognition of variants of modeled activities is achieved by searching the space of admissible parameterized transformations that these activities can undergo. This formulation iteratively refines the recognition of the class to which the observed activity belongs and the transformation parameters that relate it to the model in its class. We provide several experiments on recognition of articulated and deformable human motions from image motion parameters.
Yacoob, Y., Davis, L. S., Black, M., Gavrila, D., Horprasert, T., Morimoto, C.
Looking at people in action - An overview
In Computer Vision for Human–Machine Interaction, (Editors: R. Cipolla and A. Pentland), Cambridge University Press, 1998 (incollection)
In Sixth International Conf. on Computer Vision, ICCV’98, pages: 120-127, Mumbai, India, January 1998 (inproceedings)
A framework for modeling and recognition of temporal activities is proposed. The modeling of sets of exemplar activities is achieved by parameterizing their representation in the form of principal components. Recognition of spatio-temporal variants of modeled activities is achieved by parameterizing the search in the space of admissible transformations that the activities can undergo. Experiments on recognition of articulated and deformable object motion from image motion parameters are presented.
In Sixth International Conf. on Computer Vision, ICCV’98, pages: 660-667, Mumbai, India, January 1998 (inproceedings)
Image "appearance" may change over time due to a variety of causes such as 1) object or camera motion; 2) generic photometric events including variations in illumination (e.g. shadows) and specular reflections; and 3) "iconic changes" which are specific to the objects being viewed and include complex occlusion events and changes in the material properties of the objects. We propose a general framework for representing and recovering these "appearance changes" in an image sequence as a "mixture" of different causes. The approach generalizes previous work on optical flow to provide a richer description of image events and more reliable estimates of image motion.
Black, M., Berard, F., Jepson, A., Newman, W., Saund, E., Socher, G., Taylor, M.
The Digital Office: Overview
In AAAI Spring Symposium on Intelligent Environments, pages: 1-6, Stanford, March 1998 (inproceedings)
In IEEE Conf. on Computer Vision and Pattern Recognition, CVPR-98, pages: 274-281, IEEE, Santa Barbara, CA, 1998 (inproceedings)
The estimation and detection of occlusion boundaries and moving bars are important and challenging problems in image sequence analysis. Here, we model such motion features as linear combinations of steerable basis flow fields. These models constrain the interpretation of image motion, and are used in the same way as translational or affine motion models. We estimate the subspace coefficients of the motion feature models directly from spatiotemporal image derivatives using a robust regression method. From the subspace coefficients we detect the presence of a motion feature and solve for the orientation of the feature and the relative velocities of the surfaces. Our method does not require the prior computation of optical flow and recovers accurate estimates of orientation and velocity.
International Journal of Computer Vision, 26(1):63-84, 1998 (article)
This paper describes an approach for tracking rigid and articulated objects using a view-based representation. The approach builds on and extends work on eigenspace representations, robust estimation techniques, and parameterized optical flow estimation. First, we note that the least-squares image reconstruction of standard eigenspace techniques has a number of problems and we reformulate the reconstruction problem as one of robust estimation. Second we define a “subspace constancy assumption” that allows us to exploit techniques for parameterized optical flow estimation to simultaneously solve for the view of an object and the affine transformation between the eigenspace and the image. To account for large affine transformations between the eigenspace and the image we define a multi-scale eigenspace representation and a coarse-to-fine matching strategy. Finally, we use these techniques to track objects over long image sequences in which the objects simultaneously undergo both affine image motions and changes of view. In particular we use this “EigenTracking” technique to track and recognize the gestures of a moving hand.
IEEE Transactions on Image Processing, 7(3):421-432, March 1998 (article)
Relations between anisotropic diffusion and robust statistics are described in this paper. Specifically, we show that anisotropic diffusion can be seen as a robust estimation procedure that estimates a piecewise smooth image from a noisy input image. The edge-stopping; function in the anisotropic diffusion equation is closely related to the error norm and influence function in the robust estimation framework. This connection leads to a new edge-stopping; function based on Tukey's biweight robust estimator that preserves sharper boundaries than previous formulations and improves the automatic stopping of the diffusion. The robust statistical interpretation also provides a means for detecting the boundaries (edges) between the piecewise smooth regions in an image that has been smoothed with anisotropic diffusion. Additionally, we derive a relationship between anisotropic diffusion and regularization with line processes. Adding constraints on the spatial organization of the line processes allows us to develop new anisotropic diffusion equations that result in a qualitative improvement in the continuity of edges
Tsotsos, J. K., Verghese, G., Dickinson, S., Jenkin, M., Jepson, A., Milios, E., Nuflo, F., Stevenson, S., Black, M., Metaxas, D., Culhane, S., Ye, Y., Mann, R.
Image & Vision Computing, Special Issue on Vision for the Disabled, 16(4):275-292, 1998 (article)
This paper overviews the PLAYBOT project, a long-term, large-scale research program whose goal is to provide a directable robot which may enable physically disabled children to access and manipulate toys. This domain is the first test domain, but there is nothing inherent in the design of PLAYBOT that prohibits its extension to other tasks. The research is guided by several important goals: vision is the primary sensor; vision is task directed; the robot must be able to visually search its environment; object and event recognition are basic capabilities; environments must be natural and dynamic; users and environments are assumed to be unpredictable; task direction and reactivity must be smoothly integrated; and safety is of high importance. The emphasis of the research has been on vision for the robot this is the most challenging research aspect and the major bottleneck to the development of intelligent robots. Since the control framework is behavior-based, the visual capabilities of PLAYBOT are described in terms of visual behaviors. Many of the components of PLAYBOT are briefly described and several examples of implemented sub-systems are shown. The paper concludes with a description of the current overall system implementation, and a complete example of PLAYBOT performing a simple task.
IEEE Trans. on Circuits and Systems for Video Technology, 8(5):686-696, September 1998 (article)
This paper presents an automatic system for analyzing and annotating video sequences of technical talks. Our method uses a robust motion estimation technique to detect key frames and segment the video sequence into subsequences containing a single overhead slide. The subsequences are stabilized to remove motion that occurs when the speaker adjusts their slides. Any changes remaining between frames in the stabilized sequences may be due to speaker gestures such as pointing or writing, and we use active contours to automatically track these potential gestures. Given the constrained domain, we define a simple set of actions that can be recognized based on the active contour shape and motion. The recognized actions provide an annotation of the sequence that can be used to access a condensed version of the talk from a Web page.
In IEEE Conf. on Computer Vision and Pattern Recognition, pages: 595-601, CVPR-97, Puerto Rico, June 1997 (inproceedings)
In this paper, we present an automatic system for analyzing and annotating video sequences of technical talks. Our method uses a robust motion estimation technique to detect key frames and segment the video sequence into subsequences containing a single overhead slide. The subsequences are stabilized to remove motion that occurs when the speaker adjusts their slides. Any changes remaining between frames in the stabilized sequences may be due to speaker gestures such as pointing or writing and we use active contours to automatically track these potential gestures. Given the constrained domain we define a simple ``vocabulary'' of actions which can easily be recognized based on the active contour shape and motion. The recognized actions provide a rich annotation of the sequence that can be used to access a condensed version of the talk from a web page.
In IEEE Conf. on Computer Vision and Pattern Recognition, CVPR-97, pages: 561-567, Puerto Rico, June 1997 (inproceedings)
A framework for learning parameterized models of optical flow from image sequences is presented. A class of motions is represented by a set of orthogonal basis flow fields that are computed from a training set using principal component analysis. Many complex image motions can be represented by a linear combination of a small number of these basis flows. The learned motion models may be used for optical flow estimation and for model-based recognition. For optical flow estimation we describe a robust, multi-resolution scheme for directly computing the parameters of the learned flow models from image derivatives. As examples we consider learning motion discontinuities, non-rigid motion of human mouths, and articulated human motion.
In Int. Conf. on Image Processing, ICIP, 1, pages: 263-266, Vol. 1, Santa Barbara, CA, October 1997 (inproceedings)
Relations between anisotropic diffusion and robust statistics are described. We show that anisotropic diffusion can be seen as a robust estimation procedure that estimates a piecewise smooth image from a noisy input image. The "edge-stopping" function in the anisotropic diffusion equation is closely related to the error norm and influence function in the robust estimation framework. This connection leads to a new "edge-stopping" function based on Tukey's biweight robust estimator, that preserves sharper boundaries than previous formulations and improves the automatic stopping of the diffusion. The robust statistical interpretation also provides a means for detecting the boundaries (edges) between the piecewise smooth regions in the image. We extend the framework to vector-valued images and show applications to robust image sharpening.
Int. Journal of Computer Vision, 25(1):23-48, 1997 (article)
This paper explores the use of local parametrized models of image motion for recovering and recognizing the non-rigid and articulated motion of human faces. Parametric flow models (for example affine) are popular for estimating motion in rigid scenes. We observe that within local regions in space and time, such models not only accurately model non-rigid facial motions but also provide a concise description of the motion in terms of a small number of parameters. These parameters are intuitively related to the motion of facial features during facial expressions and we show how expressions such as anger, happiness, surprise, fear, disgust, and sadness can be recognized from the local parametric motions in the presence of significant head motion. The motion tracking and expression recognition approach performed with high accuracy in extensive laboratory experiments involving 40 subjects as well as in television and movie sequences.
PRECARN ARK Project Technical Report ARK96-PUB-54, March 1996 (techreport)
We consider the estimation of local greylevel image structure in terms of a layered representation. This type of representation has recently been successfully used to segment various objects from clutter using either optical ow or stereo disparity information. We argue that the same type of representation is useful for greylevel data in that it allows for the estimation of properties for each of several different components without prior segmentation. Our emphasis in this paper is on the process used to extract such a layered representation from a given image In particular we consider a variant of the EM algorithm for the estimation of the layered model and consider a
novel technique for choosing the number of layers to use. We briefly consider the use of a simple version of this approach for image segmentation and suggest two potential applications to the ARK project
In 2nd Int. Conf. on Automatic Face- and Gesture-Recognition, pages: 38-44, Killington, Vermont, October 1996 (inproceedings)
We extend the work of Black and Yacoob on the tracking and recognition of human facial expressions using parameterized models of optical flow to deal with the articulated motion of human limbs. We define a "cardboard person model" in which a person's limbs are represented by a set of connected planar patches. The parameterized image motion of these patches is constrained to enforce articulated motion and is solved for directly using a robust estimation technique. The recovered motion parameters provide a rich and concise description of the activity that can be used for recognition. We propose a method for performing view-based recognition of human activities from the optical flow parameters that extends previous methods to cope with the cyclical nature of human motion. We illustrate the method with examples of tracking human legs over long image sequences.
Computer Vision and Image Understanding, 63(1):75-104, January 1996 (article)
Most approaches for estimating optical flow assume that, within a finite image region, only a single motion is present. This single motion assumption is violated in common situations involving transparency, depth discontinuities, independently moving objects, shadows, and specular reflections. To robustly estimate optical flow, the single motion assumption must be relaxed. This paper presents a framework based on robust estimation that addresses violations of the brightness constancy and spatial smoothness assumptions caused by multiple motions. We show how the robust estimation framework can be applied to standard formulations of the optical flow problem thus reducing their sensitivity to violations of their underlying assumptions. The approach has been applied to three standard techniques for recovering optical flow: area-based regression, correlation, and regularization with motion discontinuities. This paper focuses on the recovery of multiple parametric motion models within a region, as well as the recovery of piecewise-smooth flow fields, and provides examples with natural and synthetic image sequences.
Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems