Multimodal Human-Robot Interaction Combining Speech, Facial Expressions, and Eye Gaze

Loading...
Thumbnail Image
Author (Corporation)
Publication date
2021
Typ of student thesis
Master
Course of study
Type
11 - Student thesis
Editors
Editor (Corporation)
Parent work
Special issue
DOI of the original publication
Link
Series
Series number
Volume
Issue / Number
Pages / Duration
Patent number
Publisher / Publishing institution
Hochschule für Wirtschaft FHNW
Place of publication / Event location
Olten
Edition
Version
Programming language
Assignee
Practice partner / Client
Abstract
Human-Robot Interaction (HRI) is being applied more and more frequently in different areas with the continuous uprise of technology. While initially, dialogue systems only involved recognizing a spoken or text input, the shift has changed towards multimodal dialogue systems. In essence, this implies that multiple input channels are retrieved, such as apart from the verbal input, also various nonverbal channels such as the human’s facial expression. The artifact developed during this Master thesis consists of a multimodal dialogue system involving Pepper, a humanoid robot developed by SoftBank Robotics (n.d.-a) and input channels capturing the human’s speech, facial expression, and eye gaze. By establishing a modular architecture using network communication, the collected inputs are combined and sent to Rasa (2021), an open-source conversational agent running on an intermediate server. Upon Pepper receiving the response selected by Rasa, a body language animation is performed, and an emoji matching the social context is displayed on the attached tablet, while simultaneously speaking the response back to the interacting partner. The results of the evaluation phase propose that while speech and eye gaze recognition achieve high levels of accuracy, the facial expression recognition component cannot provide the same reliability. Apart from the facial expression recognition concept proposed by SoftBank Robotics (n.d.-g), two different approaches were defined by the author of this Master thesis and require further evaluation. The overall response times of the multimodal system are kept low with Rasa requiring the majority of the time to select the appropriate response.
Keywords
Subject (DDC)
Project
Event
Exhibition start date
Exhibition end date
Conference start date
Conference end date
Date of the last check
ISBN
ISSN
Language
English
Created during FHNW affiliation
Yes
Strategic action fields FHNW
Publication status
Review
Open access category
License
Citation
Applewhite, T. (2021). Multimodal Human-Robot Interaction Combining Speech, Facial Expressions, and Eye Gaze [Hochschule für Wirtschaft FHNW]. https://irf.fhnw.ch/handle/11654/40432