Semantic segmentation is another fundamental problem of computer vision that requires answering what is where in a given image, video or a point cloud. Better performing recent techniques require human annotations which are costly and time consuming to obtain. Such human annotated ground-truth is used to train models for better inference on test image/point cloud. In this project, we try to address the following two questions.
- How to acquire accurate training data with minimal ‘human’ cost [ ]?
- How to build fast and efficient models for test time inference leveraging the collected data [ ], [ ]?
In [ ], we developed a scalable technique to generate pixelwise annotations for images. For a given 3D reconstructed scene, we annotate static elements in a rough manner and transfer annotations into the image domain using a novel label propagation technique leveraging geometric constraints. We leverage our method to obtain 2D labels for a novel suburban video dataset which we have collected, resulting in 400k semantic and instance image annotations.
In [ ], [ ] we introduced fast and efficient techniques for semantic segmentation to propagate information using well established Auto-Context and Bilateral filter techniques. Empirical performance on several standard benchmarks show significant improvements over strong baseline techniques and related approaches.