Profile

I am currently a researcher in Moonshot, a long-term research project to transform society by 2050. The focus of my work is to create a semi-autonomous conversational agent which accomplishes simple conversation tasks but can also recognize more difficult tasks and then hand control to a remote human operator. The operator should be able to control multiple avatars, enabling parallel conversations and hopefully improving task efficiency.

I am also interested in making conversational agents more human-like through non-linguistic behaviors such as backchannels, turn-taking and laughter. We have created behaviors for an attentive listening agent to try and allow them to show empathy towards the user. This listening agent has been used in several robots and virtual agents.

ERICA

ERICA uses a spherical microphone array and a Kinect sensor to track users, recognize their dialog and evaluate their attention, eye gaze and other non-verbal behaviors. The user can converse without a hand-held microphone, leading to more natural non-verbal behavior during conversation. We have performed several public demonstrations of ERICA's conversational abilities, both in a live setting and remotely.

Conversational roles:

ERICA is being developed for a number of human roles, each with different conversational requirements:

Attentive listening: Listening to the user while producing backchannels and repeated responses.
Job Interviewer: Extracting of keywords from the interviewed user to produce appropriate follow-up questions
Lab Guide: Simple question-answering system to introduce users to our lab
Speed Dating: Mixed-initiative conversation where ERICA assesses the user as a potential partner
Wikitalk: Users ask ERICA about topics on Wikipedia

Conversational models:

We have created interaction models for ERICA to give her human-like conversational abilities. Unlike chatbots or virtual assistants such as Google Home or Siri, these models are particularly needed for situated conversational robots.

We recently developed a shared laughter model to predict when, if and how a robot should laugh in response to the user's laugh. This model allows ERICA and other robots to laugh along with the user. Our paper was featured in several international media outlets and science magazines including:

The Guardian

Science News Explores

Inverse

We created a turn-taking model which predicts if the user has finished their conversational turn, using the acoustic signal of their speech and the lexical output. Both of these modalities are combined into a fusion model. We found that this model significantly outperforms the baseline and is further improved when used together with a finite-state turn-taking machine. In our current implementation ERICA can respond to the user in less than a second, with a 5% interruption rate.

Our engagement model recognizes several types of user behavior and produces a likelihood that they are engaged in the conversation. With this information we are then able to manage ERICA's behavior by deciding whether or not she should continue in the current topic or move to something else.

For realistic attentive listening, we created a backchannel model which identifies appropriate timing of Japanese backchannels (相槌). This model is used when ERICA is listening to the user. Multimodal backchannels (e.g. head nodding) can also be used by ERICA. We also developed a statement response model for attentive listening which identifies a focus word in the user's speech and then produces an appropriate response using that focus word. When used together with the backchannel model, ERICA can take the role of an attentive listener, and stimulate the user to continue talking. The system is domain-independent, so the user can talk about any topic.

Public Demonstrations and Media:

I demonstrated ERICA and Wikitalk at IJCAI 2019

Our public symposium with ERICA has been featured in Japanese media! Links are below (Japanese only):

ERICA as a job interviewer

Virtual Basketball (PhD work)

For my doctoral studies I created a virtual basketball game to analyze interactions between human users and agent players. My system uses a Kinect sensor, immersive displays and a foot pressure sensor to allow the user to play basketball without the need for a keyboard or mouse. I created a gesture recognition system and integrated speech recognition for multi-modal play.

I used virtual basketball to examine various aspects of human-agent interaction such as:

Application of real-world theories to virtual world interactions
Analysis of joint action behaviors between people in the virtual world
Evaluation of agent body communication
Multimodal analysis, particularly speech + body movement

My major research deals with theory, design, implementation and evaluation of virtual agents and robots. I have been a part of several other projects including designing cultural characters, virtual and robot telepresence and eye gaze tracking. Please contact me if you are interested in collaboration.