Since the release of the Kinect, RGB-D cameras have been used in several consumer devices, including smartphones. In this talk, I will present two challenging uses of this technology. With multiple RGB-D cameras, it is possible to reconstruct a 3D scene and visualize it from any point of view. In the first part of the talk, I will show how such a scene can be streamed and rendered as a point cloud in a compelling way and its appearance improved by the use of external cinema cameras. In the second part of the talk, I will present my work on how an RGB-D camera can be used for enabling real-walking in virtual reality by making the user aware of the surrounding obstacles. I present a pipeline to create an occupancy map from a point cloud on the fly on a mobile phone used as a virtual reality headset. This occupancy map can then be used to prevent the user from hitting physical obstacles when walking in the virtual scene.
Organizers: Sergi Pujades
Organizers: Ahmed Osman
Visual perception involves a complex interaction between feedforward and feedback processes. A mechanistic understanding of these processing, and its limitations, is a necessary first step towards elucidating key aspects of perceptual functions and dysfunctions. In this talk, I will review our ongoing effort towards the understanding of how feedback visual processing operates at the level of the thalamus, a dynamic relay station halfway between the retina and the cortex. I will present experimental evidence from several recent electrophysiology studies performed on subjects engaged in visual detection tasks. The results show that modulatory driving provided by top-down processes (the feedback from primary visual cortex) critically influences the ongoing thalamic activity and shapes the message to be delivered to the cortex. When neuromodulatory techniques (Transcranial Magnetic Stimulation or static magnetic fields) are used to transiently disrupt cortical activity two very interesting effects show up: (1) alterations in stimulus detection and (2) the spatial properties of thalamic receptive fields are dramatically modified. Finally, I will show how sensory information can be a powerful tool to interact with the motor system and re-organize altered patterns of movement in neurological disorders such as Parkinson's disease.
Organizers: Daniel Cudeiro
Disney Research has been actively pushing the state-of-the-art in digitizing humans over the past decade, impacting both academia and industry. In this talk I will give an overview of a selected few projects in this area, from research into production. I will be talking about photogrammetric shape acquisition and dense performance capture for faces, eye and teeth scanning and parameterization, as well as physically based capture and modelling for hair and volumetric tissues.
Organizers: Timo Bolkart
The definition of art has been debated for more than 1000 years, and continues to be a puzzle. While scientific investigations offer hope of resolving this puzzle, machine learning classifiers that discriminate art from non-art images generally do not provide an explicit definition, and brain imaging and psychological theories are at present too coarse to provide a formal characterization. In this work, rather than approaching the problem using a machine learning approach trained on existing artworks, we hypothesize that art can be defined in terms of preexisting properties of the visual cortex. Specifically, we propose that a broad subset of visual art can be defined as patterns that are exciting to a visual brain. Resting on the finding that artificial neural networks trained on visual tasks can provide predictive models of processing in the visual cortex, our definition is operationalized by using a trained deep net as a surrogate “visual brain”, where “exciting” is defined as the activation energy of particular layers of this net. We find that this definition easily discriminates a variety of art from non-art, and further provides a ranking of art genres that is consistent with our subjective notion of ‘visually exciting’. By applying a deep net visualization technique, we can also validate the definition by generating example images that would be classified as art. The images synthesized under our definition resemble visually exciting art such as Op Art and other human- created artistic patterns.
Organizers: Michael Black
One of the central problems of artificial intelligence is machine perception, i.e., the ability to understand the visual world based on input from sensors such as cameras. In this talk, I will present recent progress with respect to data generation using weak annotations, motion information and synthetic data. I will also discuss our recent results for action recognition, where human tubes and tubelets have shown to be successful. Our tubelets moves away from state-of-the-art frame based approaches and improve classification and localization by relying on joint information from several frames. I also show how to extend this type of method to weakly supervised learning of actions, which allows us to scale to large amounts of data with sparse manual annotation. Furthermore, I discuss several recent extensions, including 3D pose estimation.
Organizers: Ahmed Osman
Quantifying behavior is crucial for many applications in neuroscience. Videography provides easy methods for the observation and recording of animal behavior in diverse settings, yet extracting particular aspects of a behavior for further analysis can be highly time consuming. In motor control studies, humans or other animals are often marked with reflective markers to assist with computer-based tracking, yet markers are intrusive (especially for smaller animals), and the number and location of the markers must be determined a priori. Here, we present a highly efficient method for markerless tracking based on transfer learning with deep neural networks that achieves excellent results with minimal training data. We demonstrate the versatility of this framework by tracking various body parts in a broad collection of experimental settings: mice odor trail-tracking, egg-laying behavior in drosophila, and mouse hand articulation in a skilled forelimb task. For example, during the skilled reaching behavior, individual joints can be automatically tracked (and a confidence score is reported). Remarkably, even when a small number of frames are labeled (≈200), the algorithm achieves excellent tracking performance on test frames that is comparable to human accuracy.
Organizers: Melanie Feldhofer
Human shape estimation is an important task for video editing, animation and fashion industry. Predicting 3D human body shape from natural images, however, is highly challenging due to factors such as variation in human bodies, clothing and viewpoint. Prior methods addressing this problem typically attempt to fit parametric body models with certain priors on pose and shape. In this work we argue for an alternative representation and propose BodyNet, a neural network for direct inference of volumetric body shape from a single image. BodyNet is an end-to-end trainable network that benefits from (i) a volumetric 3D loss, (ii) a multi-view re-projection loss, and (iii) intermediate supervision of 2D pose, 2D body part segmentation, and 3D pose. Each of them results in performance improvement as demonstrated by our experiments. To evaluate the method, we fit the SMPL model to our network output and show state-of-the-art results on the SURREAL and Unite the People datasets, outperforming recent approaches. Besides achieving state-of-the-art performance, our method also enables volumetric body-part segmentation.
Human footsteps can provide a unique behavioural pattern for robust biometric systems. Traditionally, security systems have been based on passwords or security access cards. Biometric recognition deals with the design of security systems for automatic identification or verification of a human subject (client) based on physical and behavioural characteristics. In this talk, I will present spatio-temporal raw and processed footstep data representations designed and evaluated on deep machine learning models based on a two-stream resnet architecture, by using the SFootBD database the largest footstep database to date with more than 120 people and almost 20,000 footstep signals. Our models deliver an artificial intelligence capable of effectively differentiating the fine-grained variability of footsteps between legitimate users (clients) and impostor users of the biometric system. We provide experimental results in 3 critical data-driven security scenarios, according to the amount of footstep data available for model training: at airports security checkpoints (smallest training set), workspace environments (medium training set) and home environments (largest training set). In these scenarios we report state-of-the-art footstep recognition rates.
Organizers: Dimitrios Tzionas
Animals are widespread in nature and the analysis of their shape and motion is of importance in many fields and industries. Modeling 3D animal shape, however, is difficult because the 3D scanning methods used to capture human shape are not applicable to wild animals or natural settings. In our previous SMAL model, we learn animal shape from toys figurines, but toys are limited in number and realism, and not every animal is sufficiently popular for there to be realistic toys depicting it. What is available in large quantities are images and videos of animals from nature photographs, animal documentaries, and webcams. In this talk I will present our recent work for capturing the detailed 3D shape of animals from images alone. Our method extracts significantly more 3D shape detail than previous work and is able to model new species using only a few video frames. Additionally, we extract realistic texture map from images for capturing both animal shape and appearance.
Active vision has long put forward the idea, that visual sensation and our actions are inseparable, especially when considering naturalistic extended behavior. Further support for this idea comes from theoretical work in optimal control, which demonstrates that sensing, planning, and acting in sequential tasks can only be separated under very restricted circumstances. The talk will present experimental evidence together with computational explanations of human visuomotor behavior in tasks ranging from classic psychophysical detection tasks to ball catching and visuomotor navigation. Along the way it will touch topics such as the heuristics hypothesis and learning of visual representations. The connecting theme will be that, from the switching of visuomotor behavior in response to changing task-constraints down to cortical visual representations in V1, action and perception are inseparably intertwined in an ambiguous and uncertain world
Organizers: Betty Mohler