Michael J. Black received his B.Sc. from the University of British Columbia (1985), his M.S. from Stanford (1989), and his Ph.D. in computer science from Yale University (1992). After research at NASA Ames and post-doctoral research at the University of Toronto, he joined the Xerox Palo Alto Research Center in 1993 where he later managed the Image Understanding Area and founded the Digital Video Analysis group. From 2000 to 2010 he was on the faculty of Brown University in the Department of Computer Science (Assoc. Prof. 2000-2004, Prof. 2004-2010). He is a founding director at the Max Planck Institute for Intelligent Systems in Tübingen, Germany, where he leads the Perceiving Systems department. He is an honorary professor at the University of Tübingen, a visiting professor at ETH Zürich, and an adjunct professor (research) at Brown University.
Black is a foreign member of the Royal Swedish Academy of Sciences. He is a recipient of the 2010 Koenderink Prize for Fundamental Contributions in Computer Vision and the 2013 Helmholtz Prize for work that has stood the test of time. His work has won several paper awards including the IEEE Computer Society Outstanding Paper Award (CVPR'91). His work received Honorable Mention for the Marr Prize in 1999 and 2005. His early work on optical flow has been widely used in Hollywood films including for the Academy-Award-winning effects in “What Dreams May Come” and “The Matrix Reloaded.” He has contributed to several influential datasets including the Middlebury Flow dataset, HumanEva, and the Sintel dataset. He is a co-founder, science advisor, and member of the board of directors of Body Labs Inc., which is commercializing his team’s research on 3D human body shape.
Prof. Black's research interests in machine vision include optical flow estimation, 3D shape models, human shape and motion analysis, robust statistical methods, and probabilistic models of the visual world. In computational neuroscience his work focuses on probabilistic models of the neural code and applications of neural decoding in neural prosthetics.
Michael Black received his B.Sc. from the University of British Columbia (1985), his M.S. from Stanford (1989), and his Ph.D. from Yale University (1992). After post-doctoral research at the University of Toronto, he worked at Xerox PARC as a member of research staff and an area manager. From 2000 to 2010 he was on the faculty of Brown University in the Department of Computer Science (Assoc. Prof. 2000-2004, Prof. 2004-2010). He is one of the founding directors at the Max Planck Institute for Intelligent Systems in Tübingen, Germany, where he leads the Perceiving Systems department. He is an Honorarprofessor at the University of Tuebingen, Visiting Professor at ETH Zürich, and Adjunct Professor (Research) at Brown University. His work has won several awards including the IEEE Computer Society Outstanding Paper Award (1991), Honorable Mention for the Marr Prize (1999 and 2005), the 2010 Koenderink Prize for Fundamental Contributions in Computer Vision, and the 2013 Helmholtz Prize for work that has stood the test of time. He is a foreign member of the Royal Swedish Academy of Sciences. He is also a co-founder, science advisor, and board member of Body Labs Inc.
Royal Swedish Academy of Sciences
Foreign member, Class for Engineering Sciences, since June 2015.
for the paper: Black, M. J., and Anandan, P., "A framework for the robust estimation of optical flow,'' IEEE International Conference on Computer Vision, ICCV, pages 231-236, Berlin, Germany. May 1993.
2010Koenderink Prize for Fundamental Contributions in Computer Vision,
with Sidenbladh, H. and Fleet, D. J. for the paper "Stochastic tracking of 3D human figures using 2D image motion,'' European Conference on Computer Vision, 2000.
"Dataset Award" at the Eurographics Symposium on Geometry Processing 2016, with F. Bogo, J. Romero, and M. Loper, for the paper "FAUST: Dataset and evaluation for 3D mesh registration," CVPR 2014.
Best Paper Award, International Conference on 3D Vision (3DV), 2015, with A. O. Ulusoy and A. Geiger, for the paper "Towards Probabilistic Volumetric Reconstruction using Ray Potentials."
Best Paper Award, INI-Graphics Net, 2008, First Prize Winner of Category Research,
with S. Roth for the paper "Steerable random fields."
Best Paper Award, Fourth International Conference on Articulated Motion and Deformable Objects (AMDO-e 2006), with L. Sigal for the paper "Predicting 3D people from 2D pictures.''
Marr Prize, Honorable Mention, Int. Conf. on Computer Vision, ICCV-2005, Beijing, China, Oct. 2005 with S. Roth for the paper "On the spatial statistics of optical flow.''
Marr Prize, Honorable Mention, Int. Conf. on Computer Vision, ICCV-99, Corfu, Greece, Sept. 1999 with D. J. Fleet for the paper "Probabilistic detection and tracking of motion discontinuities.''
IEEE Computer Society, Outstanding Paper Award, Conference on Computer Vision and Pattern Recognition, Maui, Hawaii, June 1991 with P. Anandan for the paper "Robust dynamic motion estimation over time.''
Commendation and Chief's Award, Henrico County Division of Police,
County of Henrico, Virginia, April 19, 2007.
University of Maryland, Invention of the Year, 1995, "Tracking and Recognizing Facial Expressions,'' with Y. Yacoob.
University of Toronto, Computer Science Students' Union Teaching Award for 1992-1993.
My research addressed the problem of estimating and explaining motion in image sequences. I developed methods detecting and tracking 2D and 3D human motion including the introduction of particle filtering for 3D human tracking and belief propagation for 3D human pose estimation. I worked on probabilistic models of images include the high-order Field of Experts model. I worked on 3D human shape estimation from images and video and developed applications of this technology. I also developed mathematical models for decoding neural signals. This included the first uses of particle filtering and Kalman filtering for decoding motor cortical neural activity and the first point-and-click cortical neural brain-machine-interface for people with paralysis.
Research included modeling image changes (motion, illumination, specularity, occlusion, etc.) in video as a mixture of causes. I developed methods of motion explanation; that is, the extraction of mid-level or high-level concepts from motion.This included the modeling and recognition of motion "features" (occlusion boundaries, moving bars, etc.), human facial expressions and gestures, and motion "texture" (plants, fire, water, etc.). I applied these methods to problems in video indexing, motion for video annotation, teleconferencing, and gestural user interfaces. Other research included robust learning of image-based models, regularization with transparency, anisotropic diffusion, and the recovery of multiple shapes from transparent textures.
Research included the application of mixture models to optical flow, detection and tracking of surface discontinuities using motion information, and robust surface recovery in dynamic environments.
Yale University, (9/89-8/92) New Haven, CT
Research Assistant, Department of Computer Science.
Research in the recovery of optical flow, incremental estimation, temporal continuity, applications of robust statistics to optical flow, the relationship between robust statistics and line processes, the early detection of motion discontinuities, and the role of representation in computer vision.
Developed motion estimation algorithms in the context of an autonomous Mars landing and nap-of-the-earth helicopter flight and studied the psychophysical implications of a temporal continuity assumption.
Research on spatial reasoning for robotic vehicle route planning and terrain analysis. Vision research including perceptual grouping, object-based translational motion processing, the integration of vision and control for an autonomous vehicle, object modeling using generalized cylinders, and the development of an object-oriented vision environment.
GTE Government Systems, (6/85-12/86) Mountain View, CA
Engineer, Artificial Intelligence Group.
Developed expert systems for multi-source data fusion and fault location.
Summer undergraduate researcher at UBC; park ranger's assistant; volunteer firefighter, busboy; and probably my worst job: cleaning dog kennels.
I am interested in motion. What does motion tell us about the structure of the world and how can we compute this from video? How do humans and animals move? How does the brain control complex movement? My work combines computer vision, graphics and neuroscience to develop new models and algorithms to capture and analyze the motion of the world.
My Computer Vision research addresses:
the estimation of scene structure and physical properties from video;
modeling the neural control of reaching and grasping;
novel neural decoding algorithms;
neural prostheses and cortical brain-machine interfaces;
markless animal motion capture.
What is maybe unique about my work is the combination of the these themes. For example I study human motion from the inside (decoding neural activity in paralyzed humans) and the outside (with novel motion capture techniques).
Frank Wood, Associate Professor, Department of Engineering, Oxford
Thesis: Nonparametric Bayesian modeling of neural data. Department of Computer Science, Brown University
Hulya Yalcin, Assistant Professor, Department of Electronics and Communications Engineering, Istanbul Technical University, Turkey
Thesis: Implicit models of moving and static surfaces, Division of Engineering, Brown University, May 2004
Wei Wu, Associate Professor, Dept. of Statistics, Florida State
Thesis: Statistical models of neural coding in motor cortex, Division of Applied Math, Brown University. Co-supervised with David Mumford. May 2004.
Fernando De la Torre, Research Associate Professor, CMU and Facebook,
Thesis: Robust subspace learning for computer vision, La Salle School of Engineering. Universitat Ramon Llull, Barcelona, Spain. Jan. 2002
My old Brown site has several image sequences used in my older publications. These include some classic sequences such as Yosemite, the Pepsi can, the SRI tree sequence, and the Flower Garden sequence.
A Quantitative Analysis of Current Practices in Optical Flow Estimation and the Principles behind Them
Sun, D., Roth, S., and Black, M.J. International Journal of Computer Vision (IJCV), 106(2):115-137, 2014. (pdf)
Secrets of optical flow estimation and their principles
Sun, D., Roth, S., and Black, M. J., IEEE Conf. on Computer Vision and Pattern Recog., CVPR, June 2010. (pdf)
This method implements many of the currently best known techniques for accurate optical flow and was once ranked #1 on the Middlebury evaluation (June 2010).
The software is made available for research pupropses. Please read the copyright statement and contact me for commerical licensing.
2. Matlab implmentation of the Black and Anandan dense optical flow method
The Matlab flow code is easier to use and more accurate than the original C code. The objective function being optimized is the same but the Matlab version uses more modern optimization methods:
The method in 1 above is more accurate and also implements Black and Anandan plus much more.
3. Original Black and Anandan method implemented in C
The optical flow software here has been used by a number of graphics companies to make special effects for movies. This software is provided for research purposes only; any sale or use for commercial purposes is strictly prohibited.
Contact me for the password to download the software, stating that it is for research purposes.
Please contact me if you wish to use this code for commercial purpose.
If you are a commercial enterprise and would like assistance in using optical flow in your application, please contact me at my consulting address firstname.lastname@example.org.
This is EXPERIMENTAL software. It is provided to illustrate some ideas in the robust estimation of optical flow. Use at your own risk. No warranty is implied by this distribution.
The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields,
Black, M. J. and Anandan, P., Computer Vision and Image Understanding, CVIU, 63(1), pp. 75-104, Jan. 1996. (pdf),(pdf from publisher)
Robust Principal Component Analysis (PCA)
Software is from the ICCV'2001 paper with Fernando De la Torre.
The code below provides a simple Matlab implementation of the Bayesian 3D person tracking system described in ECCV'00 and ICCV'01. It is too slow to be used to track the entire body but can be used to track various limbs and provides a basis for people who want to understand the methods better and extend them.
Stochastic tracking of 3D human figures using 2D image motion,
Sidenbladh, H., Black, M. J., and Fleet, D.J., European Conference on Computer Vision, D. Vernon (Ed.), Springer Verlag, LNCS 1843, Dublin, Ireland, pp. 702-718 June 2000. (postscript)(pdf), (abstract)
Software. (Note: if you uncompress and untar this on a PC using Winzip, the path names may be lost which will cause Matlab to fail when you load the .mat files. Instead uncompress/untar using gunzip and tar.)
Computer Graphics Forum 36(2), Proceedings of the 38th Annual Conference of the European Association for Computer Graphics (Eurographics), 2017 (article)
We address the problem of making human motion capture in the wild more practical by using a small set of inertial sensors attached to the body. Since the problem is heavily under-constrained, previous methods either use a large number of sensors, which is intrusive, or they require additional video input. We take a different approach and constrain the problem by: (i) making use of a realistic statistical body model that includes anthropometric constraints and (ii) using a joint optimization framework to fit the model to orientation and acceleration measurements over multiple frames. The resulting tracker Sparse Inertial Poser (SIP) enables motion capture using only 6 sensors (attached to the wrists, lower legs, back and head) and works for arbitrary human motions. Experiments on the recently released TNT15 dataset show that, using the same number of sensors, SIP achieves higher accuracy than the dataset baseline without using any video data. We further demonstrate the effectiveness of SIP on newly recorded challenging motions in outdoor scenarios such as climbing or jumping over a wall
In Computer Vision – ECCV 2016, Lecture Notes in Computer Science, Springer International Publishing, October 2016 (inproceedings)
We describe the first method to automatically estimate the 3D pose of the human body as well as its 3D shape from a single unconstrained image. We estimate a full 3D mesh and show that 2D joints alone carry a surprising amount of information about body shape. The problem is challenging because of the complexity of the human body, articulation, occlusion, clothing, lighting, and the inherent ambiguity in inferring 3D from 2D. To solve this, we first use a recently published CNN-based method, DeepCut, to predict (bottom-up) the 2D body joint locations. We then fit (top-down) a recently published statistical body shape model, called SMPL, to the 2D joints. We do so by minimizing an objective function that penalizes the error between the projected 3D model joints and detected 2D joints. Because SMPL captures correlations in human shape across the population, we are able to robustly fit it to very little data. We further leverage the 3D model to prevent solutions that cause interpenetration. We evaluate our method, SMPLify, on the Leeds Sports, HumanEva, and Human3.6M datasets, showing superior
pose accuracy with respect to the state of the art.
Psychological Science, 27(11):1486-1497, November 2016, (article)
Brief verbal descriptions of bodies (e.g. curvy, long-legged) can elicit vivid mental images. The ease with which we create these mental images belies the complexity of three-dimensional body shapes. We explored the relationship between body shapes and body descriptions and show that a small number of words can be used to generate categorically accurate representations of three-dimensional bodies. The dimensions of body shape variation that emerged in a language-based similarity space were related to major dimensions of variation computed directly from three-dimensional laser scans of 2094 bodies. This allowed us to generate three-dimensional models of people in the shape space using only their coordinates on analogous dimensions in the language-based description space. Human descriptions of photographed bodies and their corresponding models matched closely. The natural mapping between the spaces illustrates the role of language as a concise code for body shape, capturing perceptually salient global and local body features.
ACM Trans. Graph. (Proc. SIGGRAPH), 35(4):54:1-54:14, July 2016 (article)
Realistic, metrically accurate, 3D human avatars are useful for games, shopping, virtual reality, and health applications. Such avatars are not in wide use because solutions for creating them from high-end scanners, low-cost range cameras, and tailoring measurements all have limitations. Here we propose a simple solution and show that it is surprisingly accurate. We use crowdsourcing to generate attribute ratings of 3D body shapes corresponding to standard linguistic descriptions of 3D shape. We then learn a linear function relating these ratings to 3D human shape parameters. Given an image of a new body, we again turn to the crowd for ratings of the body shape. The collection of linguistic ratings of a photograph provides remarkably strong constraints on the metric 3D shape. We call the process crowdshaping and show that our Body Talk system produces shapes that are perceptually indistinguishable from bodies created from high-resolution scans and that the metric accuracy is sufficient for many tasks. This makes body “scanning” practical without a scanner, opening up new applications including database search, visualization, and extracting avatars from books.
In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)
In this paper, we propose a non-local structured prior for volumetric multi-view 3D reconstruction. Towards this goal, we present a novel Markov random field model based on ray potentials in which assumptions about large 3D surface patches such as planarity or Manhattan world constraints can be efficiently encoded as probabilistic priors. We further derive an inference algorithm that reasons jointly about voxels, pixels and image segments, and estimates marginal distributions of appearance, occupancy, depth, normals and planarity. Key to tractable inference is a novel hybrid representation that spans both voxel and pixel space and that integrates non-local information from 2D image segmentations in a principled way. We compare our non-local prior to commonly employed local smoothness assumptions and a variety of state-of-the-art volumetric reconstruction baselines on challenging outdoor scenes with textureless and reflective surfaces. Our experiments indicate that regularizing over larger distances has the potential to resolve ambiguities where local regularizers fail.
In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)
Video object segmentation is challenging due to fast moving objects, deforming shapes, and cluttered backgrounds. Optical flow can be used to propagate an object segmentation over time but, unfortunately, flow is often inaccurate, particularly around object boundaries. Such boundaries are precisely where we want our segmentation to be accurate. To obtain accurate segmentation across time, we propose an efficient algorithm that considers video segmentation and optical flow estimation simultaneously. For video segmentation, we formulate a principled, multiscale, spatio-temporal objective function that uses optical flow to propagate information between frames. For optical flow estimation, particularly at object boundaries, we compute the flow independently in the segmented regions and recompose the results. We call the process object flow and demonstrate the effectiveness of jointly optimizing optical flow and video segmentation using an iterative scheme. Experiments on the SegTrack v2 and Youtube-Objects datasets show that the proposed algorithm performs favorably against the other state-of-the-art methods.
In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)
Occlusion boundaries contain rich perceptual information about the underlying scene structure. They also provide important cues in many visual perception tasks such as scene understanding, object recognition, and segmentation. In this paper, we improve occlusion boundary detection via enhanced exploration of contextual information (e.g., local structural boundary patterns, observations from surrounding regions, and temporal context), and in doing so develop a novel approach based on convolutional neural networks (CNNs) and conditional random fields (CRFs). Experimental results demonstrate that our detector significantly outperforms the state-of-the-art (e.g., improving the F-measure from 0.62 to 0.71 on the commonly used CMU benchmark). Last but not least, we empirically assess the roles of several important components of the proposed detector, so as to validate the rationale behind this approach.
In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)
Existing optical flow methods make generic, spatially homogeneous, assumptions about the spatial structure of the flow. In reality, optical flow varies across an image depending on object class.
Simply put, different objects move differently. Here we exploit recent advances in static semantic scene segmentation to segment the image into objects of different types. We define different models of image motion in these regions depending on the type of object. For example, we model the motion on roads with homographies, vegetation with spatially smooth flow, and independently moving objects like cars and planes with affine motion plus deviations. We then pose the flow estimation problem using a novel formulation of localized layers, which addresses limitations of traditional layered models for dealing with complex scene motion. Our semantic flow method achieves the lowest error of any published monocular method in the KITTI-2015 flow benchmark and produces qualitatively better flow and segmentation than recent top methods on a wide range of natural videos.
In 11th Int. Conf. on Computer Graphics Theory and Applications (GRAPP), Febuary 2016 (inproceedings)
Advances in 3D scanning technology allow us to create realistic virtual avatars from full body 3D scan data. However, negative reactions to some realistic computer generated humans suggest that this approach might not always provide the most appealing results. Using styles derived from existing popular character designs, we present a novel automatic stylization technique for body shape and colour information based on a statistical 3D model of human bodies. We investigate whether such stylized body shapes result in increased perceived appeal with two different experiments: One focuses on body shape alone, the other investigates the additional role of surface colour and lighting. Our results consistently show that the most appealing avatar is a partially stylized one. Importantly, avatars with high stylization or no stylization at all were rated to have the least appeal. The inclusion of colour information and improvements to render quality had no significant effect on the overall perceived appeal of the avatars, and we observe that the body shape primarily drives the change in appeal ratings. For body scans with colour information, we found that a partially stylized avatar was most effective, increasing average appeal ratings by approximately 34%.
IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI), December 2015 (article)
In large datasets, manual data verification is impossible, and we must expect the number of outliers to increase with data size. While principal component analysis (PCA) can reduce data size, and scalable solutions exist, it is well-known that outliers can arbitrarily corrupt the results. Unfortunately, state-of-the-art approaches for robust PCA are not scalable. We note that in a zero-mean dataset, each observation spans a one-dimensional subspace, giving a point on the Grassmann manifold. We show that the average subspace corresponds to the leading principal component for Gaussian data. We provide a simple algorithm for computing this Grassmann Average (GA), and show that the subspace estimate is less sensitive to outliers than PCA for general distributions. Because averages can be efficiently computed, we immediately gain scalability. We exploit robust averaging to formulate the Robust Grassmann Average (RGA) as a form of robust PCA. The resulting Trimmed Grassmann Average (TGA) is appropriate for computer vision because it is robust to pixel outliers. The algorithm has linear computational complexity and minimal memory requirements. We demonstrate TGA for background modeling, video restoration, and shadow removal. We show scalability by performing robust PCA on the entire Star Wars IV movie; a task beyond any current method. Source code is available online.
In International Conference on Computer Vision (ICCV), pages: 2300-2308, December 2015 (inproceedings)
We accurately estimate the 3D geometry and appearance of the human body from a monocular RGB-D sequence of a user moving freely in front of the sensor. Range data in each frame is first brought into alignment with a multi-resolution 3D body model in a coarse-to-fine process. The method then uses geometry and image texture over time to obtain accurate shape, pose, and appearance information despite unconstrained motion, partial views, varying resolution, occlusion, and soft tissue deformation. Our novel body model has variable shape detail, allowing it to capture faces with a high-resolution deformable head model and body shape with lower-resolution. Finally we combine range data from an entire sequence to estimate a high-resolution displacement map that captures fine shape details. We compare our recovered models with high-resolution scans from a professional system and with avatars created by a commercial product. We extract accurate 3D avatars from challenging motion sequences and even capture soft tissue dynamics.
In IEEE International Conference on Computer Vision (ICCV), pages: 3514-3522, December 2015 (inproceedings)
We formulate the estimation of dense depth maps from video sequences as a problem of intrinsic image estimation. Our approach synergistically integrates the estimation of multiple intrinsic images including depth, albedo, shading, optical flow, and surface contours. We build upon an example-based framework for depth estimation that uses label transfer from a database of RGB and depth pairs. We combine this with a method that extracts consistent albedo and shading from video. In contrast to raw RGB values, albedo and shading provide a richer, more physical, foundation for depth transfer. Additionally we train a new contour detector to predict surface boundaries from albedo, shading, and pixel values and use this to improve the estimation of depth boundaries. We also integrate sparse structure from motion with our method to improve the metric accuracy of the estimated depth maps. We evaluate our Intrinsic Depth method quantitatively by estimating depth from videos in the NYU RGB-D and SUN3D datasets. We find that combining the estimation of multiple intrinsic images improves depth estimation relative to the baseline method.
ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1-248:16, ACM, New York, NY, October 2015 (article)
We present a learned model of human body shape and pose-dependent shape variation that is more accurate than previous models and is compatible with existing graphics pipelines. Our Skinned Multi-Person Linear model (SMPL) is a skinned vertex-based model that accurately represents a wide variety of body shapes in natural human poses. The parameters of the model are learned from data including the rest pose template, blend weights, pose-dependent blend shapes, identity-dependent blend shapes, and a regressor from vertices to joint locations. Unlike previous models, the pose-dependent blend shapes are a linear function of the elements of the pose rotation matrices. This simple formulation enables training the entire model from a relatively large number of aligned 3D meshes of different people in different poses. We quantitatively evaluate variants of SMPL using linear or dual-quaternion blend skinning and show that both are more accurate than a Blend-SCAPE model trained on the same data. We also extend SMPL to realistically model dynamic soft-tissue deformations. Because it is based on blend skinning, SMPL is compatible with existing rendering engines and we make it available for research purposes.
In Proc. ACM SIGGRAPH Symposium on Applied Perception, SAP’15, pages: 7-14, ACM, New York, NY, September 2015 (inproceedings)
We investigated the influence of body shape and pose on the perception of physical strength and social power for male virtual characters. In the first experiment, participants judged the physical strength of varying body shapes, derived from a statistical 3D body model. Based on these ratings, we determined three body shapes (weak, average, and strong) and animated them with a set of power poses for the second experiment. Participants rated how strong or powerful they perceived virtual characters of varying body shapes that were displayed in different poses. Our results show that perception of physical strength was mainly driven by the shape of the body. However, the social attribute of power was influenced by an interaction between pose and shape. Specifically, the effect of pose on power ratings was greater for weak body shapes. These results demonstrate that a character with a weak shape can be perceived as more powerful when in a high-power pose.
In 3D Vision (3DV), 2015 3rd International Conference on, pages: 10-18, Lyon, October 2015 (inproceedings)
This paper presents a novel probabilistic foundation for volumetric 3-d reconstruction. We formulate the problem as inference in a Markov random field, which accurately captures the dependencies between the occupancy and appearance of each voxel, given all input images. Our main contribution is an approximate highly parallelized discrete-continuous inference algorithm to compute the marginal distributions of each voxel's occupancy and appearance. In contrast to the MAP solution, marginals encode the underlying uncertainty and ambiguity in the reconstruction. Moreover, the proposed algorithm allows for a Bayes optimal prediction with respect to a natural reconstruction loss. We compare our method to two state-of-the-art volumetric reconstruction algorithms on three challenging aerial datasets with LIDAR ground truth. Our experiments demonstrate that the proposed algorithm compares favorably in terms of reconstruction accuracy and the ability to expose reconstruction uncertainty.
In Pattern Recognition, Proc. 37th German Conference on Pattern Recognition (GCPR), LNCS 9358, pages: 412-423, Springer, 2015 (inproceedings)
We estimate 2D human pose from video using only optical flow. The key insight is that dense optical flow can provide information about 2D body pose. Like range data, flow is largely invariant to appearance but unlike depth it can be directly computed from monocular video. We demonstrate that body parts can be detected from dense flow using the same random forest approach used by the Microsoft Kinect. Unlike range data, however, when people stop moving, there is no optical flow and they effectively disappear. To address this, our FlowCap method uses a Kalman filter to propagate body part positions and ve- locities over time and a regression method to predict 2D body pose from part centers. No range sensor is required and FlowCap estimates 2D human pose from monocular video sources containing human motion. Such sources include hand-held phone cameras and archival television video. We demonstrate 2D body pose estimation in a range of scenarios and show that the method works with real-time optical flow. The results suggest that optical flow shares invariances with range data that, when complemented with tracking, make it valuable for pose estimation.
Vargas-Irwin, C., Franquemont, L., Black, M. J., Donoghue, J.
Journal of Neuroscience, 35(30):10888-10897, July 2015 (article)
Neural activity in ventral premotor cortex (PMv) has been associated with the process of matching perceived objects with the motor commands needed to grasp them. It remains unclear how PMv networks can flexibly link percepts of objects affording multiple grasp options into a final desired hand action. Here, we use a relational encoding approach to track the functional state of PMv neuronal ensembles in macaque monkeys through the process of passive viewing, grip planning, and grasping movement execution. We used objects affording multiple possible grip strategies. The task included separate instructed delay periods for object presentation and grip instruction. This approach allowed us to distinguish responses elicited by the visual presentation of the objects from those associated with selecting a given motor plan for grasping. We show that PMv continuously incorporates information related to object shape and grip strategy as it becomes available, revealing a transition from a set of ensemble states initially most closely related to objects, to a new set of ensemble patterns reflecting unique object-grip combinations. These results suggest that PMv dynamically combines percepts, gradually navigating toward activity patterns associated with specific volitional actions, rather than directly mapping perceptual object properties onto categorical grip representations. Our results support the idea that PMv is part of a network that dynamically computes motor plans from perceptual information.
Significance Statement: The present work demonstrates that the activity of groups of neurons in primate ventral premotor cortex reflects information related to visually presented objects, as well as the motor strategy used to grasp them, linking individual objects to multiple possible grips. PMv could provide useful control signals for neuroprosthetic assistive devices designed to interact with objects in a flexible way.
Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems