I am a research group leader in the Department of Perceiving Systems at the Max Planck Institute for Intelligent Systems, my group is funded by the DFG through the CRC 1233 on Robust Vision.
I am interested in the intersection between computer vision and machine learning with a focus on holistic visual scene understanding. In particular, I am interested in analyzing and modeling people in our complex visual scenes.
Offers:I am looking for highly motivated PhD student and PhD interns. I also have projects for bachelor and master thesis. If you are interested, please contact me direclty or send your application to firstname.lastname@example.org
(New)Our workon part-aligned bilinear representations for person re-identification is online.
(New)Our work on human action segmentation in real time is online, and the code is available.
I will be an area chair for ACCV 2018.
I received anEarly career research grantto start my own research group at the Max Planck Instiute for Intelligent Systems and the University of Tübingen, details coming soon. I am looking for highly motivated PhD student and PhD interns!
I have successfully defended my PhD thesis "People Detection and Tracking in Crowded Scenes" on the 29th September 2017 at the Max Planck Institute for Informatics. Thesis Committee: Prof. Bernt Schiele, Prof. Michael Black, Prof. Luc Van Gool.
Winner of the CVPR 2017 Multi-Object Tracking Challenge (MOT17).
Four papers accepted at CVPR 2017!
Winner of the Multi-Object Tracking Challenge at CVPR 2017
Winner of the Multi-Object Tracking Challenge at ECCV 2016
BMVC Best Paper Award, 2012
Scholarship for excellence in academic performance RWTH Aachen 2009, 2010
SS 2016: High-Level Computer Vision, Saarland University, teaching assistant
SS 2015: High-Level Computer Vision, Saarland University, teaching assistant
SS 2013: High-Level Computer Vision, Saarland University, teaching assistant
We propose a novel network that learns a part-aligned representation for person re-identification. It handles the body part misalignment problem, that is, body parts are misaligned across human detections due to pose/viewpoint change and unreliable detection. Our model consists of a two-stream network (one stream for appearance map extraction and the other one for body part map extraction) and a bilinear-pooling layer that generates and spatially pools a part- aligned map. Each local feature of the part-aligned map is obtained by a bilinear mapping of the corresponding local appearance and body part descriptors. Our new representation leads to a robust image matching similarity, which is equiv- alent to an aggregation of the local similarities of the corresponding body parts combined with the weighted appearance similarity. This part-aligned representa- tion reduces the part misalignment problem significantly. Our approach is also advantageous over other pose-guided representations (e.g., extracting represen- tations over the bounding box of each body part) by learning part descriptors optimal for person re-identification. For training the network, our approach does not require any part annotation on the person re-identification dataset. Instead, we simply initialize the part sub-stream using a pre-trained sub-network of an existing pose estimation network, and train the whole network to minimize the re-identification loss. We validate the effectiveness of our approach by demon- strating its superiority over the state-of-the-art methods on the standard bench- mark datasets, including Market-1501, CUHK03, CUHK01 and DukeMTMC, and standard video dataset MARS.
We present an effective dynamic clustering algorithm for the task of temporal human action segmentation, which has comprehensive applications such as robotics, motion analysis, and patient monitoring. Our proposed algorithm is unsupervised, fast, generic to process various types of features, and applica- ble in both the online and offline settings. We perform extensive experiments of processing data streams, and show that our algorithm achieves the state-of- the-art results for both online and offline settings.
Insafutdinov, E., Andriluka, M., Pishchulin, L., Tang, S., Levinkov, E., Andres, B., Schiele, B.
Articulated Multi-person Tracking in the Wild
In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, July 2017, Oral (inproceedings)
Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P., Schiele, B.
In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)
This paper considers the task of articulated human pose estimation of multiple people in real-world images. We propose an approach that jointly solves the tasks of detection and pose estimation: it infers the number of persons in a scene, identifies occluded body parts, and disambiguates body parts between people in close proximity of each other.
This joint formulation is in contrast to previous strategies, that address the problem by first detecting people and subsequently estimating their body pose. We propose a partitioning and labeling formulation of a set of body-part hypotheses generated with CNN-based part detectors. Our formulation, an instance of an integer linear program, implicitly performs non-maximum suppression on the set of part candidates and groups them to form configurations of body parts respecting geometric and appearance constraints. Experiments on four different datasets demonstrate state-of-the-art results for both single person and multi person pose estimation.
Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems