Header logo is ps


2019


Towards Geometric Understanding of Motion
Towards Geometric Understanding of Motion

Ranjan, A.

University of Tübingen, December 2019 (phdthesis)

Abstract

The motion of the world is inherently dependent on the spatial structure of the world and its geometry. Therefore, classical optical flow methods try to model this geometry to solve for the motion. However, recent deep learning methods take a completely different approach. They try to predict optical flow by learning from labelled data. Although deep networks have shown state-of-the-art performance on classification problems in computer vision, they have not been as effective in solving optical flow. The key reason is that deep learning methods do not explicitly model the structure of the world in a neural network, and instead expect the network to learn about the structure from data. We hypothesize that it is difficult for a network to learn about motion without any constraint on the structure of the world. Therefore, we explore several approaches to explicitly model the geometry of the world and its spatial structure in deep neural networks.

The spatial structure in images can be captured by representing it at multiple scales. To represent multiple scales of images in deep neural nets, we introduce a Spatial Pyramid Network (SpyNet). Such a network can leverage global information for estimating large motions and local information for estimating small motions. We show that SpyNet significantly improves over previous optical flow networks while also being the smallest and fastest neural network for motion estimation. SPyNet achieves a 97% reduction in model parameters over previous methods and is more accurate.

The spatial structure of the world extends to people and their motion. Humans have a very well-defined structure, and this information is useful in estimating optical flow for humans. To leverage this information, we create a synthetic dataset for human optical flow using a statistical human body model and motion capture sequences. We use this dataset to train deep networks and see significant improvement in the ability of the networks to estimate human optical flow.

The structure and geometry of the world affects the motion. Therefore, learning about the structure of the scene together with the motion can benefit both problems. To facilitate this, we introduce Competitive Collaboration, where several neural networks are constrained by geometry and can jointly learn about structure and motion in the scene without any labels. To this end, we show that jointly learning single view depth prediction, camera motion, optical flow and motion segmentation using Competitive Collaboration achieves state-of-the-art results among unsupervised approaches.

Our findings provide support for our hypothesis that explicit constraints on structure and geometry of the world lead to better methods for motion estimation.

PhD Thesis [BibTex]

2019

PhD Thesis [BibTex]

2011


no image
ISocRob-MSL 2011 Team Description Paper for Middle Sized League

Messias, J., Ahmad, A., Reis, J., Sousa, J., Lima, P.

15th Annual RoboCup International Symposium 2011, July 2011 (techreport)

Abstract
This paper describes the status of the ISocRob MSL robotic soccer team as required by the RoboCup 2011 qualification procedures. The most relevant technical and scientifical developments carried out by the team, since its last participation in the RoboCup MSL competitions, are here detailed. These include cooperative localization, cooperative object tracking, planning under uncertainty, obstacle detection and improvements to self-localization.

link (url) [BibTex]

2011

link (url) [BibTex]


Benchmark datasets for pose estimation and tracking
Benchmark datasets for pose estimation and tracking

Andriluka, M., Sigal, L., Black, M. J.

In Visual Analysis of Humans: Looking at People, pages: 253-274, (Editors: Moesland and Hilton and Kr"uger and Sigal), Springer-Verlag, London, 2011 (incollection)

publisher's site Project Page [BibTex]

publisher's site Project Page [BibTex]


Fields of experts
Fields of experts

Roth, S., Black, M. J.

In Markov Random Fields for Vision and Image Processing, pages: 297-310, (Editors: Blake, A. and Kohli, P. and Rother, C.), MIT Press, 2011 (incollection)

Abstract
Fields of Experts are high-order Markov random field (MRF) models with potential functions that extend over large pixel neighborhoods. The clique potentials are modeled as a Product of Experts using nonlinear functions of many linear filter responses. In contrast to previous MRF approaches, all parameters, including the linear filters themselves, are learned from training data. A Field of Experts (FoE) provides a generic, expressive image prior that can capture the statistics of natural scenes, and can be used for a variety of machine vision tasks. The capabilities of FoEs are demonstrated with two example applications, image denoising and image inpainting, which are implemented using a simple, approximate inference scheme. While the FoE model is trained on a generic image database and is not tuned toward a specific application, the results compete with specialized techniques.

publisher site [BibTex]

publisher site [BibTex]


Dorsal Stream: From Algorithm to Neuroscience
Dorsal Stream: From Algorithm to Neuroscience

Jhuang, H.

PhD Thesis, MIT, 2011 (techreport)

pdf [BibTex]


Steerable random fields for image restoration and inpainting
Steerable random fields for image restoration and inpainting

Roth, S., Black, M. J.

In Markov Random Fields for Vision and Image Processing, pages: 377-387, (Editors: Blake, A. and Kohli, P. and Rother, C.), MIT Press, 2011 (incollection)

Abstract
This chapter introduces the concept of a Steerable Random Field (SRF). In contrast to traditional Markov random field (MRF) models in low-level vision, the random field potentials of a SRF are defined in terms of filter responses that are steered to the local image structure. This steering uses the structure tensor to obtain derivative responses that are either aligned with, or orthogonal to, the predominant local image structure. Analysis of the statistics of these steered filter responses in natural images leads to the model proposed here. Clique potentials are defined over steered filter responses using a Gaussian scale mixture model and are learned from training data. The SRF model connects random fields with anisotropic regularization and provides a statistical motivation for the latter. Steering the random field to the local image structure improves image denoising and inpainting performance compared with traditional pairwise MRFs.

publisher site [BibTex]

publisher site [BibTex]


Spatial Models of Human Motion
Spatial Models of Human Motion

Soren Hauberg

University of Copenhagen, 2011 (phdthesis)

PDF [BibTex]

PDF [BibTex]


Model-Based Pose Estimation
Model-Based Pose Estimation

Pons-Moll, G., Rosenhahn, B.

In Visual Analysis of Humans: Looking at People, pages: 139-170, 9, (Editors: T. Moeslund, A. Hilton, V. Krueger, L. Sigal), Springer, 2011 (inbook)

book page pdf [BibTex]

book page pdf [BibTex]

2006


Implicit Wiener Series, Part II: Regularised estimation
Implicit Wiener Series, Part II: Regularised estimation

Gehler, P., Franz, M.

(148), Max Planck Institute, 2006 (techreport)

pdf [BibTex]

2006


{HumanEva}: Synchronized video and motion capture dataset for evaluation of articulated human motion
HumanEva: Synchronized video and motion capture dataset for evaluation of articulated human motion

Sigal, L., Black, M. J.

(CS-06-08), Brown University, Department of Computer Science, 2006 (techreport)

pdf abstract [BibTex]

pdf abstract [BibTex]


Products of ``Edge-perts''
Products of “Edge-perts”

Gehler, P., Welling, M.

In Advances in Neural Information Processing Systems 18, pages: 419-426, (Editors: Weiss, Y. and Sch"olkopf, B. and Platt, J.), MIT Press, Cambridge, MA, 2006 (incollection)

pdf [BibTex]

pdf [BibTex]

1997


Recognizing human motion using parameterized models of optical flow
Recognizing human motion using parameterized models of optical flow

Black, M. J., Yacoob, Y., Ju, X. S.

In Motion-Based Recognition, pages: 245-269, (Editors: Mubarak Shah and Ramesh Jain,), Kluwer Academic Publishers, Boston, MA, 1997 (incollection)

pdf [BibTex]

1997

pdf [BibTex]

1996


Mixture Models for Image Representation
Mixture Models for Image Representation

Jepson, A., Black, M.

PRECARN ARK Project Technical Report ARK96-PUB-54, March 1996 (techreport)

Abstract
We consider the estimation of local greylevel image structure in terms of a layered representation. This type of representation has recently been successfully used to segment various objects from clutter using either optical ow or stereo disparity information. We argue that the same type of representation is useful for greylevel data in that it allows for the estimation of properties for each of several different components without prior segmentation. Our emphasis in this paper is on the process used to extract such a layered representation from a given image In particular we consider a variant of the EM algorithm for the estimation of the layered model and consider a novel technique for choosing the number of layers to use. We briefly consider the use of a simple version of this approach for image segmentation and suggest two potential applications to the ARK project

pdf [BibTex]

1996

pdf [BibTex]