ERICA: The ERATO Intelligent Conversational Android

Introduction

This paper introduces ERICA, an autonomous android system capable of conversational interaction. It features advanced sensing and speech synthesis technologies, and is arguably the most human-like android built to date.

The extreme human-like qualities of ERICA stems from her visual design, facial expressiveness, and highly expressive speech synthesizer. Her sensing technologies are some of the most capable to date, with high-performance speech recognition, the ability to discriminate between sound sources using microphone arrays, and precise tracking of people’s locations and movements.

The ultimate goal of ERICA is to make her communicate in a convincingly human-like way in face-to-face interactions.

Figure 1: Photograph of ERICA.

Background of Current Androids

Limitations

In recent years, androids have become increasingly visible in both research and the popular media. Android replicas of celebrities and individuals are appearing in the news, and androids are depicted in film and television living and working alongside people in daily life. However, today’s androids are often very limited in their ability to conduct autonomous conversational interactions. Currently, androids can be classified into the following categories:

Platform Architecture

Here we introduce the ERICA platform architecture:

Hardware and Actuation

The mechanical and aesthetic design of ERICA were developed together with the android manufacturer A-Lab (http://www.a-lab-japan.co.jp/en/).

1.External appearance

Her facial feature proportions were determined based on principles of beauty theory used in cosmetic surgeries, such as the ideal angle and ratios for the so-called “Venus line”, or Baum ratio, defining the angle of projection of the nose, and the “1/3 rule” specifying equal vertical spacing between the chin, nose, eyebrows, and hairline [7].

ERICA’s body has 44 degrees of freedom (DOF), depicted in Fig. 2, of which 19 are controllable. The skeletal body axes shown in black in Fig. 2 (right) are actuated.

Figure 2: Degrees of freedom in ERICA. Left: Facial degrees of freedom. Right: Skeletal degrees of freedom. Joints marked in black are active joints, and joints drawn in white are passive.

2. Speech Synthesis

ERICA’s speech synthesis is performed using a custom voice designed for Hoya’ s VoiceText software (http://voicetext.jp/). Default rendering of most sentences is typically smooth with intonation determined by grammar. Manual specification of pitch, speed, and intensity is also possible. The generated audio signal from the speech synthesizer is sent back to the robot to generate lip sync and body rhythm behaviors, as shown in Fig. 3

3. Sensing

ERICA currently uses external sensors on a wired network for human position tracking, sound source localization, and recognition of speech and prosodic information. The elements of the sensing framework are shown on the left side of Fig. 3.

Figure 3: System diagram illustrating sensor inputs, internal control logic, and interaction with speech synthesis and motion generation.

4. Control Architecture

The software architecture for the ERICA platform combines a memory model, a set of behavior modules for generating dynamic movements, and a flexible software infrastructure supporting dialog management. The center area of Fig. 3 illustrates the core elements of the interaction logic.

Public Demonstration

In the public demonstration, members of the press and the public were invited on stage to direct questions at ERICA or the researchers using a wireless microphone, as shown in the photo in Fig. 4.

Figure 4: Photo of the public demonstration.

A list of 30 topics were shown on a projection screen, and visitors took turns asking ERICA about those topics. After responding to each question, ERICA asked a question in return, based on the dialog state history. For example (translated from Japanese):

Visitor: How old are you?
ERICA: I’m 23 years old. Even though I was just built, please don’t call me 0 years old. (laughs)
ERICA: Do you think I look older?
Visitor: Yes, I think so.
ERICA: (giggles and smiles) Thanks! People always think I look younger, so I’m happy to hear that.

ERICA also responded to utterances of the researchers and the MC at different times in the demonstration. The visitor, the MC, and the two researchers each had separate microphones, and each microphone was independently processed for speech recognition and prosodic information. This enabled ERICA to respond to each person in an appropriate way. For example:

Researcher: (Turns to ERICA after answering a visitor’s question). ERICA, you’re the greatest robot ever, aren’t you?
ERICA: (Turns to the researcher and smiles) Yes! (Then, after a short pause, makes a worried expression) Well… actually, we’ll see. That depends on how well my researchers program me.

Achievements and Future Work

hardware platform

At least one news agency (http://mashable.com/2015/08/12/erica-android-japan/) reported on the demonstration with the headline, “Japan’s Erica android isn’t as creepy as other talking robots.” In the future, full-body poses and expressiveness will be necessary.

Speech synthesis

The naturalness and expressiveness of the speech synthesis is quite satisfying. In the future, utterances will be generated along with gestures and expressions.

nonverbal behavior

1. Explicit Expressions and Gestures

ERICA uses subtle, human-like facial expressions. With ERICA’s hardware configuration, it would be difficult to create very dramatic expressions. But for everyday tasks, subtle expressions would likely be more useful, especially given the modest level of expressiveness in Japanese culture.

2. Implicit Behaviors

During ERICA’s interactions, implicit behavior modules were used to actuate breathing, blinking, gaze, speaking rhythm, and backchannel nodding. In the future, the modules will be improved and formalized for a variety of new implicit behaviors, such as motion control for laughter, unconscious fidgeting, and methods of expressing emotion implicitly through adjustments of gaze and body movement.

3. Multimodal Perception

The capabilities of ERICA’s sensor network was quite sufficient for this demonstration. In the future, paralinguistic information conveyed by speech will be collected, by accounting for prosodic information extraction in noisy environments.

4. Desire and Intention

Currently, ERICA’s application logic is all manually crafted as sequences of utterances. In the future, visual tools such as Interaction Composer [8] will be incorporated to assist the process of interaction design. Eventually, it will be necessary to generate behavior based on representations of semantic meaning and desire and intention of the robot.

Conclusion

ERICA is the most human-like android today thanks to her visual design, facial expressiveness, and highly expressive speech synthesizer. Her sensing technologies are some of the most capable to date, with high-performance speech recognition, the ability to discriminate between sound sources using microphone arrays, and precise tracking of people’s locations and movements. This work will help provide insight on what is possible given the current state-of-the-art, and to identify key issues, allowing researches to understand the next steps on the path to create truly human-like androids.

References

A. Hartholt, D. Traum, S. C. Marsella, A. Shapiro, G. Stratou, A. Leuski, L.-P. Morency, and J. Gratch, “All together now: Introducing the Virtual Human Toolkit,” in Intelligent Virtual Agents, 2013, pp. 368-381.
S. Al Moubayed, J. Beskow, G. Skantze, and B. Granström, “Furhat: a back-projected human-like robot head for multiparty human-machine interaction,” in Cognitive Behavioural Systems, ed: Springer, 2012, pp. 114-130.
C. Breazeal, A. Brooks, J. Gray, G. Hoffman, C. Kidd, H. Lee, J. Lieberman, A. Lockerd, and D. Mulanda, “Humanoid robots as cooperative partners for people,” Int. Journal of Humanoid Robots, vol. 1, pp. 1-34, 2004.
D. Hanson, A. Olney, S. Prilliman, E. Mathews, M. Zielke, D. Hammons, R. Fernandez, and H. Stephanou, “Upending the uncanny valley,” in Proceedings of the national conference on artificial intelligence, 2005, p. 1728.
S. Nishio, H. Ishiguro, and N. Hagita, Geminoid: Teleoperated android of an existing person: INTECH Open Access Publisher Vienna, 2007.
C. Becker-Asano and H. Ishiguro, “Evaluating facial displays of emotion for the android robot Geminoid F,” in Affective Computational Intelligence (WACI), 2011 IEEE Workshop on, 2011, pp. 1-8.
P. M. Prendergast, “Facial proportions,” in Advanced Surgical Facial Rejuvenation, ed: Springer, 2012, pp. 15-22.
D.F.Glas,S.Satake,T.Kanda,andN.Hagita,”AnInteractionDesign Framework for Social Robots,” in Proceedings of Robotics: Science and Systems, Los Angeles, CA, USA, 2011.

Analyst: Oscar Li |Editor: Joni Zhong | Localized by Synced Global Team : Xiang Chen

6 comments on “ERICA: The ERATO Intelligent Conversational Android”

Andreas Ganz

2017-08-17

I cannot judge on if Erica is rooted in operator statements or operator moves. So besides this it’s a work which supposes. Is there planned to let Erica walk and move her hands?

Loading...

- Andreas Ganz
  
  2017-08-17
  
  its hands, sorry
  
  Loading...
  
Shingeki

2017-08-17

In the future, yes.

Loading...

Pingback: 7 Emerging Technologies Trends of Tomorrow – Trailyn Ventures
Pingback: 7 Emerging Technologies Trends of Tomorrow – wordpress-573418-1852875.cloudwaysapps.com
tarea อุบล

2022-03-14

I found a lot of interesting things on your blog, especially the discussion. tarea อุบล

Loading...

ERICA: The ERATO Intelligent Conversational Android

Introduction