Content provided by Taras Kucherenko, first author of the paper Gesticulator: A framework for semantically-aware speech-driven gesture generation.
Human communication is, to no small extent, non-verbal. While talking, people spontaneously gesticulate, which plays a crucial role in conveying information. Think about the hand, arm, and body motions we make when we talk. Our research is on machine learning models for non-verbal behavior generation, such as hand gestures and facial expressions. We mainly focus on hand gestures generation. We develop machine learning methods that enable virtual agents (such as avatars from a computer game) to communicate non-verbally.
What’s New: Unfortunately, the best AI methods for creating gestures in a computer have so far been “either-or” – either they adhere to the speech’s rhythm, or they pay attention to what the words we are saying mean.
This paper created the first modern AI system that was able to take both speech rhythm and meaning into account simultaneously. As a result, it can generate a much broader range of motion than was possible before: both those related to the rhythm and those associated with the speech’s meaning. This new research direction will permit characters in computer games and VR that are more engaging and authentic.
How It Works: Our model is based on deep-learning. As with any other machine-learning method it works by optimizing parameters based on a dataset. We have speech audio and text as input and corresponding motion sequence as output. The model is a deep neural network with an auto-regressive connection: model predictions are fed back to the model to ensure motion continuity. Once trained, our model can generate gestures from novel speech using audio and text data.
Key Insights: If we want interaction with social agents (such as robots or virtual avatars) to be natural and smooth, we need them to gesticulate. “Gesticulator” is the first step toward generating meaningful gestures by a machine-learning method.
With this research, we have got the Best Paper Award at ICMI 2020 for the paper “Gesticulator: A framework for semantically-aware speech-driven gesture generation”.
The paper Gesticulator: A framework for semantically-aware speech-driven gesture generation is on arXiv.
Meet the author Taras Kucherenko, Ph.D. student in Machine Learning for Social Robotics at KTH Royal Institute of Technology in Stockholm.
Share Your Research With Synced Review
Share My Research is Synced’s new column that welcomes scholars to share their own research breakthroughs with over 1.5M global AI enthusiasts. Beyond technological advances, Share My Research also calls for interesting stories behind the research and exciting research ideas. Share your research with us by clicking here.