Conversational agents in the form of virtual agents or social robots are rapidly becoming wide-spread. Humans use non-verbal behaviors to signal their intent, emotions and attitudes in human-human interactions. Conversational agents therefore need this ability as well in order to make an interaction pleasant and efficient. An important part of non-verbal communication is gesticulation: gestures communicate a large share of non-verbal content. Previous systems for gesture production were typically rule-based and could not represent the range of human gestures. Recently the gesture generation field has shifted to data-driven approaches. We follow this line of research by extending the state-of-the-art deep-learning based model. Our model leverages representation learning to enhance speech-gesture mapping. We provide analysis of different representations for the input (speech) and the output (motion) of the network by both objective and subjective evaluations. We also analyze the importance of smoothing of the produced motion and emphasize how challenging it is to evaluate gesture quality.
In the future we plan to enrich input signal by taking semantic context (text transcription) as well, make the model probabilistic and evaluate our system on the social robot NAO.
Biography: Taras Kucherenko is a Ph.D. student at Robotics, Perception and Learning lab under Prof. Hedvig Kjellström at KTH Royal Institute of Technology in Stockholm. His current research is about building generative models for non-verbal behavior generation to enable social agents to use body language as an additional communication tool. He received his MSc in machine learning at RWTH Aachen with the emphasis in Natural Language Processing. His BSc was in applied math at KPI, Kyiv with emphasis in Cryptography.