Much of our work focuses on capturing or estimating human movement. For this we seek metrically accurate 3D movement with increasing levels of detail. We are interested, however, in more than the movement of the joints, the facial muscles, the fingers, etc. What we really seek is what is behind human movement; that is, the goals, motives, emotions, and plans that drive human movement.
A first step towards understanding human motion is being able to predict it. To that end, we train deep neural networks on motion capture data to generate realistic human movements. We have shown, however, that simple baselines can outperform deep learning methods (RNNs) to existing mocap datasets, with similarly unsatisfying results that tend to the mean motion. Using our MoSh method, we transform large datasets of mocap markers into a consistent body representation (SMPL). This gives us sufficient training data for deep networks to be effective.
Existing mocap data, however, is limited, making generalization hard. Human movement, however, is, to some extent governed by the physics of the world. For example, a large person and small person will do jumping jacks differently due to their mass and its distribution. To efficiently capture such variation we turn to physics-based models of human movement.
Our interest goes deeper than prediction and physics to the causes of movement. To study this we track human movement together with speech so that we can relate the intent and goals with behavior. We do this to model how speech drives facial motions and to relate human movement to scripts in movies. We do this at both the pixel level and at the level of 3D movement.
We argue that capture, modeling and synthesis of human motion produces a virtuous cycle. If we can synthesize avatars behaving realistically in virtual worlds, then we must have modeled essential elements of human behavior. If we can build human and animal avatars that have goals, can see, and can act, then we can generate an infinite amount of training data that will let us better analyze the behavior of real people.