Multimodal Human-Robot Interaction Combining Speech, Facial Expressions, and Eye Gaze

Applewhite, Timothy

Multimodal Human-Robot Interaction Combining Speech, Facial Expressions, and Eye Gaze

Authors

Applewhite, Timothy

Author (Corporation)

Publication date

2021

Typ of student thesis

Master

Course of study

Collections

Master of Science

Full item page

Type

11 - Student thesis

Editors

Editor (Corporation)

Supervisor

Dornberger, Rolf

Zhong, Jia

Parent work

Special issue

DOI of the original publication

URI

https://irf.fhnw.ch/handle/11654/40432

Link

Series

Series number

Volume

Issue / Number

Pages / Duration

Patent number

Publisher / Publishing institution

Hochschule für Wirtschaft FHNW

Place of publication / Event location

Olten

Edition

Version

Programming language

Assignee

Practice partner / Client

Abstract

Human-Robot Interaction (HRI) is being applied more and more frequently in different areas with the continuous uprise of technology. While initially, dialogue systems only involved recognizing a spoken or text input, the shift has changed towards multimodal dialogue systems. In essence, this implies that multiple input channels are retrieved, such as apart from the verbal input, also various nonverbal channels such as the human’s facial expression. The artifact developed during this Master thesis consists of a multimodal dialogue system involving Pepper, a humanoid robot developed by SoftBank Robotics (n.d.-a) and input channels capturing the human’s speech, facial expression, and eye gaze. By establishing a modular architecture using network communication, the collected inputs are combined and sent to Rasa (2021), an open-source conversational agent running on an intermediate server. Upon Pepper receiving the response selected by Rasa, a body language animation is performed, and an emoji matching the social context is displayed on the attached tablet, while simultaneously speaking the response back to the interacting partner. The results of the evaluation phase propose that while speech and eye gaze recognition achieve high levels of accuracy, the facial expression recognition component cannot provide the same reliability. Apart from the facial expression recognition concept proposed by SoftBank Robotics (n.d.-g), two different approaches were defined by the author of this Master thesis and require further evaluation. The overall response times of the multimodal system are kept low with Rasa requiring the majority of the time to select the appropriate response.

Keywords

Subject (DDC)

330 - Wirtschaft

Project

Event

Exhibition start date

Exhibition end date

Conference start date

Conference end date

Date of the last check

ISBN

ISSN

Language

English

Created during FHNW affiliation

Yes

Strategic action fields FHNW

Publication status

Review

Open access category

License

Citation

Applewhite, T. (2021). Multimodal Human-Robot Interaction Combining Speech, Facial Expressions, and Eye Gaze [Hochschule für Wirtschaft FHNW]. https://irf.fhnw.ch/handle/11654/40432

Full item page