Michael J. Black received his B.Sc. from the University of British Columbia (1985), his M.S. from Stanford (1989), and his Ph.D. in computer science from Yale University (1992). After research at NASA Ames and post-doctoral research at the University of Toronto, he joined the Xerox Palo Alto Research Center in 1993 where he later managed the Image Understanding Area and founded the Digital Video Analysis group. From 2000 to 2010 he was on the faculty of Brown University in the Department of Computer Science (Assoc. Prof. 2000-2004, Prof. 2004-2010). He is a founding director at the Max Planck Institute for Intelligent Systems in Tübingen, Germany, where he leads the Perceiving Systems department. He is an honorary professor at the University of Tübingen, a visiting professor at ETH Zürich, and an adjunct professor (research) at Brown University.
Black is a foreign member of the Royal Swedish Academy of Sciences. He is a recipient of the 2010 Koenderink Prize for Fundamental Contributions in Computer Vision and the 2013 Helmholtz Prize for work that has stood the test of time. His work has won several paper awards including the IEEE Computer Society Outstanding Paper Award (CVPR'91). His work received Honorable Mention for the Marr Prize in 1999 and 2005. His early work on optical flow has been widely used in Hollywood films including for the Academy-Award-winning effects in “What Dreams May Come” and “The Matrix Reloaded.” He has contributed to several influential datasets including the Middlebury Flow dataset, HumanEva, and the Sintel dataset. He is a co-founder, science advisor, and member of the board of directors of Body Labs Inc., which is commercializing his team’s research on 3D human body shape.
Prof. Black's research interests in machine vision include optical flow estimation, 3D shape models, human shape and motion analysis, robust statistical methods, and probabilistic models of the visual world. In computational neuroscience his work focuses on probabilistic models of the neural code and applications of neural decoding in neural prosthetics.
Michael Black received his B.Sc. from the University of British Columbia (1985), his M.S. from Stanford (1989), and his Ph.D. from Yale University (1992). After post-doctoral research at the University of Toronto, he worked at Xerox PARC as a member of research staff and an area manager. From 2000 to 2010 he was on the faculty of Brown University in the Department of Computer Science (Assoc. Prof. 2000-2004, Prof. 2004-2010). He is one of the founding directors at the Max Planck Institute for Intelligent Systems in Tübingen, Germany, where he leads the Perceiving Systems department. He is an Honorarprofessor at the University of Tuebingen, Visiting Professor at ETH Zürich, and Adjunct Professor (Research) at Brown University. His work has won several awards including the IEEE Computer Society Outstanding Paper Award (1991), Honorable Mention for the Marr Prize (1999 and 2005), the 2010 Koenderink Prize for Fundamental Contributions in Computer Vision, and the 2013 Helmholtz Prize for work that has stood the test of time. He is a foreign member of the Royal Swedish Academy of Sciences. He is also a co-founder, science advisor, and board member of Body Labs Inc.
Royal Swedish Academy of Sciences
Foreign member, Class for Engineering Sciences, since June 2015.
for the paper: Black, M. J., and Anandan, P., "A framework for the robust estimation of optical flow,'' IEEE International Conference on Computer Vision, ICCV, pages 231-236, Berlin, Germany. May 1993.
2010Koenderink Prize for Fundamental Contributions in Computer Vision,
with Sidenbladh, H. and Fleet, D. J. for the paper "Stochastic tracking of 3D human figures using 2D image motion,'' European Conference on Computer Vision, 2000.
"Dataset Award" at the Eurographics Symposium on Geometry Processing 2016, with F. Bogo, J. Romero, and M. Loper, for the paper "FAUST: Dataset and evaluation for 3D mesh registration," CVPR 2014.
Best Paper Award, International Conference on 3D Vision (3DV), 2015, with A. O. Ulusoy and A. Geiger, for the paper "Towards Probabilistic Volumetric Reconstruction using Ray Potentials."
Best Paper Award, INI-Graphics Net, 2008, First Prize Winner of Category Research,
with S. Roth for the paper "Steerable random fields."
Best Paper Award, Fourth International Conference on Articulated Motion and Deformable Objects (AMDO-e 2006), with L. Sigal for the paper "Predicting 3D people from 2D pictures.''
Marr Prize, Honorable Mention, Int. Conf. on Computer Vision, ICCV-2005, Beijing, China, Oct. 2005 with S. Roth for the paper "On the spatial statistics of optical flow.''
Marr Prize, Honorable Mention, Int. Conf. on Computer Vision, ICCV-99, Corfu, Greece, Sept. 1999 with D. J. Fleet for the paper "Probabilistic detection and tracking of motion discontinuities.''
IEEE Computer Society, Outstanding Paper Award, Conference on Computer Vision and Pattern Recognition, Maui, Hawaii, June 1991 with P. Anandan for the paper "Robust dynamic motion estimation over time.''
Commendation and Chief's Award, Henrico County Division of Police,
County of Henrico, Virginia, April 19, 2007.
University of Maryland, Invention of the Year, 1995, "Tracking and Recognizing Facial Expressions,'' with Y. Yacoob.
University of Toronto, Computer Science Students' Union Teaching Award for 1992-1993.
My research addressed the problem of estimating and explaining motion in image sequences. I developed methods detecting and tracking 2D and 3D human motion including the introduction of particle filtering for 3D human tracking and belief propagation for 3D human pose estimation. I worked on probabilistic models of images include the high-order Field of Experts model. I worked on 3D human shape estimation from images and video and developed applications of this technology. I also developed mathematical models for decoding neural signals. This included the first uses of particle filtering and Kalman filtering for decoding motor cortical neural activity and the first point-and-click cortical neural brain-machine-interface for people with paralysis.
Research included modeling image changes (motion, illumination, specularity, occlusion, etc.) in video as a mixture of causes. I developed methods of motion explanation; that is, the extraction of mid-level or high-level concepts from motion.This included the modeling and recognition of motion "features" (occlusion boundaries, moving bars, etc.), human facial expressions and gestures, and motion "texture" (plants, fire, water, etc.). I applied these methods to problems in video indexing, motion for video annotation, teleconferencing, and gestural user interfaces. Other research included robust learning of image-based models, regularization with transparency, anisotropic diffusion, and the recovery of multiple shapes from transparent textures.
Research included the application of mixture models to optical flow, detection and tracking of surface discontinuities using motion information, and robust surface recovery in dynamic environments.
Yale University, (9/89-8/92) New Haven, CT
Research Assistant, Department of Computer Science.
Research in the recovery of optical flow, incremental estimation, temporal continuity, applications of robust statistics to optical flow, the relationship between robust statistics and line processes, the early detection of motion discontinuities, and the role of representation in computer vision.
Developed motion estimation algorithms in the context of an autonomous Mars landing and nap-of-the-earth helicopter flight and studied the psychophysical implications of a temporal continuity assumption.
Research on spatial reasoning for robotic vehicle route planning and terrain analysis. Vision research including perceptual grouping, object-based translational motion processing, the integration of vision and control for an autonomous vehicle, object modeling using generalized cylinders, and the development of an object-oriented vision environment.
GTE Government Systems, (6/85-12/86) Mountain View, CA
Engineer, Artificial Intelligence Group.
Developed expert systems for multi-source data fusion and fault location.
Summer undergraduate researcher at UBC; park ranger's assistant; volunteer firefighter, busboy; and probably my worst job: cleaning dog kennels.
I am interested in motion. What does motion tell us about the structure of the world and how can we compute this from video? How do humans and animals move? How does the brain control complex movement? My work combines computer vision, graphics and neuroscience to develop new models and algorithms to capture and analyze the motion of the world.
My Computer Vision research addresses:
the estimation of scene structure and physical properties from video;
modeling the neural control of reaching and grasping;
novel neural decoding algorithms;
neural prostheses and cortical brain-machine interfaces;
markless animal motion capture.
What is maybe unique about my work is the combination of the these themes. For example I study human motion from the inside (decoding neural activity in paralyzed humans) and the outside (with novel motion capture techniques).
Frank Wood, Associate Professor, Department of Engineering, Oxford
Thesis: Nonparametric Bayesian modeling of neural data. Department of Computer Science, Brown University
Hulya Yalcin, Assistant Professor, Department of Electronics and Communications Engineering, Istanbul Technical University, Turkey
Thesis: Implicit models of moving and static surfaces, Division of Engineering, Brown University, May 2004
Wei Wu, Associate Professor, Dept. of Statistics, Florida State
Thesis: Statistical models of neural coding in motor cortex, Division of Applied Math, Brown University. Co-supervised with David Mumford. May 2004.
Fernando De la Torre, Research Associate Professor, CMU and Facebook,
Thesis: Robust subspace learning for computer vision, La Salle School of Engineering. Universitat Ramon Llull, Barcelona, Spain. Jan. 2002
My old Brown site has several image sequences used in my older publications. These include some classic sequences such as Yosemite, the Pepsi can, the SRI tree sequence, and the Flower Garden sequence.
A Quantitative Analysis of Current Practices in Optical Flow Estimation and the Principles behind Them
Sun, D., Roth, S., and Black, M.J. International Journal of Computer Vision (IJCV), 106(2):115-137, 2014. (pdf)
Secrets of optical flow estimation and their principles
Sun, D., Roth, S., and Black, M. J., IEEE Conf. on Computer Vision and Pattern Recog., CVPR, June 2010. (pdf)
This method implements many of the currently best known techniques for accurate optical flow and was once ranked #1 on the Middlebury evaluation (June 2010).
The software is made available for research pupropses. Please read the copyright statement and contact me for commerical licensing.
2. Matlab implmentation of the Black and Anandan dense optical flow method
The Matlab flow code is easier to use and more accurate than the original C code. The objective function being optimized is the same but the Matlab version uses more modern optimization methods:
The method in 1 above is more accurate and also implements Black and Anandan plus much more.
3. Original Black and Anandan method implemented in C
The optical flow software here has been used by a number of graphics companies to make special effects for movies. This software is provided for research purposes only; any sale or use for commercial purposes is strictly prohibited.
Contact me for the password to download the software, stating that it is for research purposes.
Please contact me if you wish to use this code for commercial purpose.
If you are a commercial enterprise and would like assistance in using optical flow in your application, please contact me at my consulting address firstname.lastname@example.org.
This is EXPERIMENTAL software. It is provided to illustrate some ideas in the robust estimation of optical flow. Use at your own risk. No warranty is implied by this distribution.
The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields,
Black, M. J. and Anandan, P., Computer Vision and Image Understanding, CVIU, 63(1), pp. 75-104, Jan. 1996. (pdf),(pdf from publisher)
Robust Principal Component Analysis (PCA)
Software is from the ICCV'2001 paper with Fernando De la Torre.
The code below provides a simple Matlab implementation of the Bayesian 3D person tracking system described in ECCV'00 and ICCV'01. It is too slow to be used to track the entire body but can be used to track various limbs and provides a basis for people who want to understand the methods better and extend them.
Stochastic tracking of 3D human figures using 2D image motion,
Sidenbladh, H., Black, M. J., and Fleet, D.J., European Conference on Computer Vision, D. Vernon (Ed.), Springer Verlag, LNCS 1843, Dublin, Ireland, pp. 702-718 June 2000. (postscript)(pdf), (abstract)
Software. (Note: if you uncompress and untar this on a PC using Winzip, the path names may be lost which will cause Matlab to fail when you load the .mat files. Instead uncompress/untar using gunzip and tar.)
In Computer Vision – ECCV 2014, 8695, pages: 154-169, Lecture Notes in Computer Science, (Editors: D. Fleet and T. Pajdla and B. Schiele and T. Tuytelaars ), Springer International Publishing, September 2014 (inproceedings)
Inverse graphics attempts to take sensor data and infer 3D geometry, illumination, materials, and motions such that a graphics renderer could realistically reproduce the observed scene. Renderers, however, are designed to solve the forward process of image synthesis. To go in the other direction, we propose an approximate differentiable renderer (DR) that explicitly models the relationship between changes in model parameters and image observations. We describe a publicly available OpenDR framework that makes it easy to express a forward graphics model and then automatically obtain derivatives with respect to the model parameters and to optimize over them. Built on a new autodifferentiation package and OpenGL, OpenDR provides a local optimization method that can be incorporated into probabilistic programming frameworks. We demonstrate the power and simplicity of programming with OpenDR by using it to solve the problem of estimating human body shape from Kinect depth and RGB data.
In Computer Vision – ECCV 2014, 8690, pages: 360-375, Lecture Notes in Computer Science, (Editors: D. Fleet and T. Pajdla and B. Schiele and T. Tuytelaars ), Springer International Publishing, September 2014 (inproceedings)
Intrinsic images such as albedo and shading are valuable for later stages of visual processing. Previous methods for extracting albedo and shading use either single images or images together with depth data. Instead, we define intrinsic video estimation as the problem of extracting temporally coherent albedo and shading from video alone. Our approach exploits the assumption that albedo is constant over time while shading changes slowly. Optical flow aids in the accurate estimation of intrinsic video by providing temporal continuity as well as putative surface boundaries. Additionally, we find that the estimated albedo sequence can be used to improve optical flow accuracy in sequences with changing illumination. The approach makes only weak assumptions about the scene and we show that it substantially outperforms existing single-frame intrinsic image methods. We evaluate this quantitatively on synthetic sequences as well on challenging natural sequences with complex geometry, motion, and illumination.
Kong, N., Gehler, P., Black, M. J.
In Computer Vision – ECCV 2014, 8690, pages: 360-375, Lecture Notes in Computer Science, (Editors: D. Fleet and T. Pajdla and B. Schiele and T. Tuytelaars ), Springer International Publishing, September 2014 (inproceedings)
In Medical Image Computing and Computer-Assisted Intervention (MICCAI), 8673, pages: 593-600, Lecture Notes in Computer Science, (Editors: Golland, Polina and Hata, Nobuhiko and Barillot, Christian and Hornegger, Joachim and Howe, Robert), Spring International Publishing, September 2014 (inproceedings)
Detection of new or rapidly evolving melanocytic lesions is crucial for early diagnosis and treatment of melanoma.We propose a fully automated pre-screening system for detecting new lesions or changes in existing ones, on the order of 2 - 3mm, over almost the entire body surface. Our solution is based on a multi-camera 3D stereo system. The system captures 3D textured scans of a subject at different times and then brings these scans into correspondence by aligning them with a learned, parametric, non-rigid 3D body model. This means that captured skin textures are in accurate alignment across scans, facilitating the detection of new or changing lesions. The integration of lesion segmentation with a deformable 3D body model is a key contribution that makes our approach robust to changes in illumination and subject pose.
Foster, J., Nuyujukian, P., Freifeld, O., Gao, H., Walker, R., Ryu, S., Meng, T., Murmann, B., Black, M. J., Shenoy, K.
J. of Neural Engineering, 11(4):046020, 2014 (article)
Objective: Motor neuroscience and brain-machine interface (BMI) design is based on examining how the brain controls voluntary movement, typically by recording neural activity and behavior from animal models. Recording technologies used with these animal models have traditionally limited the range of behaviors that can be studied, and thus the generality of science and engineering research. We aim to design a freely-moving animal model using neural and behavioral recording technologies that do not constrain movement.
Approach: We have established a freely-moving rhesus monkey model employing technology that transmits neural activity from an intracortical array using a head-mounted device and records behavior through computer vision using markerless motion capture. We demonstrate the excitability and utility of this new monkey model, including the first recordings from motor cortex while rhesus monkeys walk quadrupedally on a treadmill.
Main results: Using this monkey model, we show that multi-unit threshold-crossing neural activity encodes the phase of walking and that the average ring rate of the threshold crossings covaries with the speed of individual steps. On a population level, we find that neural state-space trajectories of walking at different speeds have similar rotational dynamics in some dimensions that evolve at the step rate of walking, yet robustly separate by speed in other state-space dimensions.
Significance: Freely-moving animal models may allow neuroscientists to examine a wider range of behaviors and can provide a flexible experimental paradigm for examining the neural mechanisms that underlie movement generation across behaviors and environments. For BMIs, freely-moving animal models have the potential to aid prosthetic design by examining how neural encoding changes with posture, environment, and other real-world context changes. Understanding this new realm of behavior in more naturalistic settings is essential for overall progress of basic motor neuroscience and for the successful translation of BMIs to people with paralysis.
ACM Transactions on Applied Perception for the Symposium on Applied Perception, 11(3):13:1-13:18, September 2014 (article)
The goal of this research was to investigate women’s sensitivity to changes in their perceived weight by altering the body mass index (BMI) of the participants’ personalized avatars displayed on a large-screen immersive display. We created the personalized avatars with a full-body 3D scanner that records both the participants’ body geometry and texture. We altered the weight of the personalized avatars to produce changes in BMI while keeping height, arm length and inseam fixed and exploited the correlation between body geometry and anthropometric measurements encapsulated in a statistical body shape model created from thousands of body scans. In a 2x2 psychophysical experiment, we investigated the relative importance of visual cues, namely shape (own shape vs. an average female body shape with equivalent height and BMI to the participant) and texture (own photo-realistic texture or checkerboard pattern texture) on the ability to accurately perceive own current body weight (by asking them ‘Is the avatar the same weight as you?’). Our results indicate that shape (where height and BMI are fixed) had little effect on the perception of body weight. Interestingly, the participants perceived their body weight veridically when they saw their own photo-realistic texture and significantly underestimated their body weight when the avatar had a checkerboard patterned texture. The range that the participants accepted as their own current weight was approximately a 0.83 to −6.05 BMI% change tolerance range around their perceived weight. Both the shape and the texture had an effect on the reported similarity of the body parts and the whole avatar to the participant’s body. This work has implications for new measures for patients with body image disorders, as well as researchers interested in creating personalized avatars for games, training applications or virtual reality.
In Proceedings of the 31st International Conference on Machine Learning (ICML-14), 32(1):1152-1160, J. Machine Learning Research Workshop and Conf. and Proc., Beijing, China, June 2014 (inproceedings)
In applications of graphical models arising in domains such as computer vision and signal processing,
we often seek the most likely configurations of high-dimensional, continuous variables. We develop a particle-based max-product algorithm which maintains a diverse set of posterior mode hypotheses, and is robust to initialization.
At each iteration, the set of hypotheses at each node is augmented via stochastic proposals, and then reduced via an efficient selection algorithm. The integer program underlying our optimization-based particle selection minimizes
errors in subsequent max-product message updates. This objective automatically encourages diversity in the maintained hypotheses, without requiring tuning of application-specific distances among hypotheses. By avoiding the stochastic resampling steps underlying particle sum-product algorithms, we also avoid common degeneracies where particles collapse onto a single hypothesis. Our approach significantly outperforms previous particle-based algorithms in experiments focusing on the estimation of human pose from single images.
ACM Transactions on Graphics, (Proc. SIGGRAPH), 33(4):52:1-52:11, ACM, New York, NY, July 2014 (article)
Modeling how the human body deforms during breathing is important for the realistic animation of lifelike 3D avatars. We learn a model of body shape deformations due to breathing for different breathing types and provide simple animation controls to render lifelike breathing regardless of body shape. We capture and align high-resolution 3D scans of 58 human subjects. We compute deviations from each subject’s mean shape during breathing, and study the statistics of such shape changes for different genders, body shapes, and breathing types. We use the volume of the registered scans as a proxy for lung volume and learn a novel non-linear model relating volume and breathing type to 3D shape deformations and pose changes. We then augment a SCAPE body model so that body shape is determined by identity, pose, and the parameters of the breathing model. These parameters provide an intuitive interface with which animators can synthesize 3D human avatars with realistic breathing motions. We also develop a novel interface for animating breathing using a spirometer, which measures the changes in breathing volume of a “breath actor.”
In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 3794 -3801, Columbus, Ohio, USA, June 2014 (inproceedings)
New scanning technologies are increasing the importance of 3D mesh data and the need for algorithms that can reliably align it. Surface registration is important for building full 3D models from partial scans, creating statistical shape models, shape retrieval, and tracking. The problem is particularly challenging for non-rigid and articulated objects like human bodies. While the challenges of real-world data registration are not present in existing synthetic datasets, establishing ground-truth correspondences for real 3D scans is difficult. We address this with a novel mesh registration technique that combines 3D shape and appearance information to produce high-quality alignments. We define a new dataset called FAUST that contains 300 scans of 10 people in a wide range of poses together with an evaluation methodology. To achieve accurate registration, we paint the subjects with high-frequency textures
and use an extensive validation process to ensure accurate ground truth. We find that current shape registration methods have trouble with this real-world data. The dataset and evaluation website are available for research purposes at http://faust.is.tue.mpg.de.
In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 3810 -3817, Columbus, Ohio, USA, June 2014 (inproceedings)
As the collection of large datasets becomes increasingly automated, the occurrence of outliers will increase – “big
data” implies “big outliers”. While principal component analysis (PCA) is often used to reduce the size of data, and
scalable solutions exist, it is well-known that outliers can arbitrarily corrupt the results. Unfortunately, state-of-the-art approaches for robust PCA do not scale beyond small-to-medium sized datasets. To address this, we introduce the Grassmann Average (GA), which expresses dimensionality reduction as an average of the subspaces spanned by the data. Because averages can be efficiently computed, we immediately gain scalability. GA is inherently more robust than PCA, but we show that they coincide for Gaussian data. We exploit that averages can be made robust to formulate the Robust Grassmann Average (RGA) as a form of robust PCA. Robustness can be with respect to vectors (subspaces) or elements of vectors; we focus on the latter and use a trimmed average. The resulting Trimmed Grassmann Average (TGA) is particularly appropriate for computer vision because it is robust to pixel outliers. The algorithm has low computational complexity and minimal memory requirements, making it scalable to “big noisy data.” We demonstrate TGA for background modeling, video restoration, and shadow removal. We show scalability by performing robust PCA on the entire Star Wars IV movie.
In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 1378 -1385, Columbus, Ohio, USA, June 2014 (inproceedings)
We consider the intersection of two research fields: transfer learning and statistics on manifolds. In particular, we consider, for manifold-valued data, transfer learning of tangent-space models such as Gaussians distributions, PCA, regression, or classifiers. Though one would hope to simply use ordinary Rn-transfer learning ideas, the manifold structure prevents it. We overcome this by basing our method on inner-product-preserving parallel transport, a well-known tool widely used in other problems of statistics on manifolds in computer vision. At first, this straightforward idea seems to suffer from an obvious shortcoming: Transporting large datasets is prohibitively expensive, hindering scalability. Fortunately, with our approach, we never transport data. Rather, we show how the statistical models themselves can be transported, and prove that for the tangent-space models above, the transport “commutes” with learning. Consequently, our compact framework, applicable to a large class of manifolds, is not restricted by the size of either the training or test sets. We demonstrate the approach by transferring PCA and logistic-regression models of real-world data involving 3D shapes and image descriptors.
In Proceedings Winter Conference on Applications of Computer Vision, pages: 83-90, IEEE , March 2014 (inproceedings)
Extracting anthropometric or tailoring measurements from 3D human body scans is important for applications such as virtual try-on, custom clothing, and online sizing. Existing commercial solutions identify anatomical landmarks on high-resolution 3D scans and then compute distances or circumferences on the scan. Landmark detection is sensitive to acquisition noise (e.g. holes) and these methods require subjects to adopt a specific pose. In contrast, we propose a solution we call model-based anthropometry. We fit a deformable 3D body model to scan data in one or more poses; this model-based fitting is robust to scan noise. This brings the scan into registration with a database of registered body scans. Then, we extract features from the registered model (rather than from the scan); these include, limb lengths, circumferences, and statistical features of global shape. Finally, we learn a mapping from these features to measurements using regularized linear regression. We perform an extensive evaluation using the CAESAR dataset and demonstrate that the accuracy of our method outperforms state-of-the-art methods.
Homer, M., Perge, J., Black, M. J., Harrison, M., Cash, S., Hochberg, L.
IEEE Transactions on Neural Systems and Rehabilitation Engineering, 22(2):239-248, March 2014 (article)
Intracortical brain computer interfaces (iBCIs) decode intended movement from neural activity for the control of external devices such as a robotic arm. Standard approaches include a calibration phase to estimate decoding parameters. During iBCI operation, the statistical properties of the neural activity can depart from those observed during calibration, sometimes hindering a user’s ability to control the iBCI. To address this problem, we adaptively correct the offset terms within a Kalman filter decoder via penalized maximum likelihood estimation. The approach can handle rapid shifts in neural signal behavior (on the order of seconds) and requires no knowledge of the intended movement. The algorithm, called MOCA, was tested using simulated neural activity and evaluated retrospectively using data collected from two people with tetraplegia operating an iBCI. In 19 clinical research test cases, where a nonadaptive Kalman filter yielded relatively high decoding errors, MOCA significantly reduced these errors (10.6 ± 10.1\%; p < 0.05, pairwise t-test). MOCA did not significantly change the error in the remaining 23 cases where a nonadaptive Kalman filter already performed well. These results suggest that MOCA provides more robust decoding than the standard Kalman filter for iBCIs.
International Journal of Computer Vision (IJCV), 106(2):115-137, 2014 (article)
The accuracy of optical flow estimation algorithms has been improving steadily as evidenced by results on the Middlebury optical flow benchmark. The typical formulation, however, has changed little since the work of Horn and Schunck. We attempt to uncover what has made recent advances possible through a thorough analysis of how the objective function, the optimization method, and modern implementation practices influence accuracy. We discover that "classical'' flow formulations perform surprisingly well when combined with modern optimization and implementation techniques. One key implementation detail is the median filtering of intermediate flow fields during optimization. While this improves the robustness of classical methods it actually leads to higher energy solutions, meaning that these methods are not optimizing the original objective function. To understand the principles behind this phenomenon, we derive a new objective function that formalizes the median filtering heuristic. This objective function includes a non-local smoothness term that robustly integrates flow estimates over large spatial neighborhoods. By modifying this new term to include information about flow and image boundaries we develop a method that can better preserve motion details. To take advantage of the trend towards video in wide-screen format, we further introduce an asymmetric pyramid downsampling scheme that enables the estimation of longer range horizontal motions. The methods are evaluated on Middlebury, MPI Sintel, and KITTI datasets using the same parameter settings.
(7), Max Planck Institute for Intelligent Systems, October 2013 (techreport)
We introduce Puppet Flow (PF), a layered model describing the optical flow of a person in a video sequence. We consider video frames composed by two layers: a foreground layer corresponding to a person, and background.
We model the background as an affine flow field. The foreground layer, being a moving person, requires reasoning about the articulated nature of the human body. We thus represent the foreground layer with the Deformable Structures model (DS), a parametrized 2D part-based human body representation. We call the motion field defined through articulated motion and deformation of the DS model, a Puppet Flow. By exploiting the DS representation, Puppet Flow is a parametrized optical flow field, where parameters are the person's pose, gender and body shape.
Systems, methods, and computer-readable storage media for simulating realistic clothing. The system generates a clothing deformation model for a clothing type, wherein the clothing deformation model factors a change of clothing shape due to rigid limb rotation, pose-independent body shape, and pose-dependent deformations. Next, the system generates a custom-shaped garment for a given body by mapping, via the clothing deformation model, body shape parameters to clothing shape parameters. The system then automatically dresses the given body with the custom- shaped garment.
In IEEE International Conference on Computer Vision (ICCV), pages: 3192-3199, IEEE, Sydney, Australia, December 2013 (inproceedings)
Although action recognition in videos is widely studied, current methods often fail on real-world datasets. Many recent approaches improve accuracy and robustness to cope with challenging video sequences, but it is often unclear
what affects the results most. This paper attempts to provide insights based on a systematic performance evaluation
using thoroughly-annotated data of human actions. We annotate human Joints for the HMDB dataset (J-HMDB). This annotation can be used to derive ground truth optical flow and segmentation. We evaluate current methods using
this dataset and systematically replace the output of various algorithms with ground truth. This enables us to discover what is important – for example, should we work on improving flow algorithms, estimating human bounding boxes, or enabling pose estimation? In summary, we find that highlevel pose features greatly outperform low/mid level features; in particular, pose over time is critical, but current pose estimation algorithms are not yet reliable enough to provide this information. We also find that the accuracy of a top-performing action recognition framework can be greatly increased by refining the underlying low/mid level features; this suggests it is important to improve optical flow and human detection algorithms. Our analysis and JHMDB dataset should facilitate a deeper understanding of action recognition algorithms.
Homer, M., Harrison, M., Black, M. J., Perge, J., Cash, S., Friehs, G., Hochberg, L.
In 6th International IEEE EMBS Conference on Neural Engineering, pages: 715-718, San Diego, November 2013 (inproceedings)
Kalman filtering is a common method to decode neural signals from the motor cortex. In clinical research investigating the use of intracortical brain computer interfaces (iBCIs), the technique enabled people with tetraplegia to control assistive devices such as a computer or robotic arm directly from their neural activity. For reaching movements, the Kalman filter typically estimates the instantaneous endpoint velocity of the control device. Here, we analyzed attempted arm/hand movements by people with tetraplegia to control a cursor on a computer screen to reach several circular targets. A standard velocity Kalman filter is enhanced to additionally decode for the cursor’s position. We then mix decoded velocity and position to generate cursor movement commands. We analyzed data, offline, from two participants across six sessions. Root mean squared error between the actual and estimated
cursor trajectory improved by 12.2 ±10.5% (pairwise t-test, p<0.05) as compared to a standard velocity Kalman filter. The findings suggest that simultaneously decoding for intended velocity and position and using them both to generate movement commands can improve the performance of iBCIs.
In IEEE Conf. on Computer Vision and Pattern Recognition, (CVPR 2013), pages: 2451-2458, Portland, OR, June 2013 (inproceedings)
Layered models allow scene segmentation and motion estimation to be formulated together and to inform one another. Traditional layered motion methods, however, employ fairly weak models of scene structure, relying on locally connected Ising/Potts models which have limited ability to capture long-range correlations in natural scenes. To address this, we formulate a fully-connected layered model that enables global reasoning about the complicated segmentations of real objects. Optimization with fully-connected graphical models is challenging, and our inference algorithm leverages recent work on efficient mean field updates for fully-connected conditional random fields. These methods can be implemented efficiently using high-dimensional Gaussian filtering. We combine these ideas with a layered flow model, and find that the long-range connections greatly improve segmentation into figure-ground layers when compared with locally connected MRF models. Experiments on several benchmark datasets show that the method can recover fine structures and large occlusion regions, with good flow accuracy and much lower computational cost than previous locally-connected layered models.
Faces and bodies are complex structures, perception of which can play important roles in person identification and inference of emotional state. Face representations have been explored using behavioural adaptation: in particular, studies have shown that face aftereffects show relatively broad tuning for viewpoint, consistent with origin in a high-level structural descriptor far removed from the retinal image. Our goals were to determine first, if body aftereffects also showed a degree of viewpoint invariance, and second if they also showed pose invariance, given that changes in pose create even more dramatic changes in the 2-D retinal image. We used a 3-D model of the human body to generate headless body images, whose parameters could be varied to generate different body forms, viewpoints, and poses. In the first experiment, subjects adapted to varying viewpoints of either slim or heavy bodies in a neutral stance, followed by test stimuli that were all front-facing. In the second experiment, we used the same front-facing bodies in neutral stance as test stimuli, but compared adaptation from bodies in the same neutral stance to adaptation with the same bodies in different poses. We found that body aftereffects were obtained over substantial viewpoint changes, with no significant decline in aftereffect magnitude with increasing viewpoint difference between adapting and test images. Aftereffects also showed transfer across one change in pose but not across another. We conclude that body representations may have more viewpoint invariance than faces, and demonstrate at least some transfer across pose, consistent with a high-level structural description.
Keywords: aftereffect, shape, face, representation
Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems