This talk aims to argue for a fine-grained perspective onto human-object interactions, from video sequences. I will present approaches for the understanding of ‘what’ objects one interacts with during daily activities, ‘when’ should we label the temporal boundaries of interactions, ‘which’ semantic labels one can use to describe such interactions and ‘who’ is better when contrasting people perform the same interaction. I will detail my group’s latest works on sub-topics related to: (1) assessing action ‘completion’ – when an interaction is attempted but not completed [BMVC 2018], (2) determining skill or expertise from video sequences [CVPR 2018] and (3) finding unequivocal semantic representations for object interactions [ongoing work].
I will also introduce EPIC-KITCHENS 2018, the recently released largest dataset of object interactions in people’s homes, recorded using wearable cameras. The dataset includes 11.5M frames fully annotated with objects and actions, based on unique annotations from the participants narrating their own videos, thus reflecting true intention. Three open challenges are now available on object detection, action recognition and action anticipation [http://epic-kitchens.github.io]
Biography: Dima Damen: Reader (Associate Professor) in Computer Vision at the University of Bristol, United Kingdom. Received her PhD from the University of Leeds (2009). Dima's research interests are in the automatic understanding of object interactions, actions and activities using static and wearable visual (and depth) sensors. Dima co-chaired BMVC 2013, is area chair for BMVC (2014-2018), associate editor of Pattern Recognition (2017-). She was selected as a Nokia Research collaborator in 2016, and as an Outstanding Reviewer in ICCV17, CVPR13 and CVPR12. She currently supervises 8 PhD students, and 3 postdoctoral researchers. More details at: [http://dimadamen.github.io]