Bodies in computer vision have often been an afternought. Human pose is often represented 10-12 body joints in 2D or 3D. This is inspired by Johannson's moving light displays, which showed that some human actions can be represented by the major joints of the body. But the joints don't capture everything. The skeletal structure of the body is also a popular representation but is only approximate and never actually observed in images.
In our work we have focused on 3D body shape, represented as a triangulated mesh. Shape gives us more information about a person related to their health, age, fitness, and clothign size. But shape is also useful because we have shape and this shape is critical to our physical interactions with the world. We can't interpenetrate objects and they can't interpenetrate us.
It has taken a few years for the field to catch on to this idea but now our SMPL [ ] body model is widely used in research and industry. It is simple, efficient, posable, and compatible with most graphics packages. It is also differentiable and easy to integrate with optimization or deep learning methods.
While popular, SMPL has drawbacks. Pose deformations are non-local, feet are noisy, the face doesn't move, the hands are rigid, there is no clothing and no hair. We are addressing all of these in ongoing work (see the theme on Clothing and projects on Faces and Hands). Our latest work is putting bodies, faces and hands together in a simple model that can be fit to data or animated. Like all our body models, we train this from scans of people to capture the realism and statistics of the population.
Such models provide the foundation for our analysis of human movement, emotion, and behavior.