Multimodal Human-Robot Interaction Combining Speech, Facial Expressions, and Eye Gaze

dc.contributor.authorApplewhite, Timothy
dc.contributor.mentorDornberger, Rolf
dc.contributor.mentorZhong, Jia
dc.date.accessioned2023-12-22T16:04:07Z
dc.date.available2023-12-22T16:04:07Z
dc.date.issued2021
dc.description.abstractHuman-Robot Interaction (HRI) is being applied more and more frequently in different areas with the continuous uprise of technology. While initially, dialogue systems only involved recognizing a spoken or text input, the shift has changed towards multimodal dialogue systems. In essence, this implies that multiple input channels are retrieved, such as apart from the verbal input, also various nonverbal channels such as the human’s facial expression. The artifact developed during this Master thesis consists of a multimodal dialogue system involving Pepper, a humanoid robot developed by SoftBank Robotics (n.d.-a) and input channels capturing the human’s speech, facial expression, and eye gaze. By establishing a modular architecture using network communication, the collected inputs are combined and sent to Rasa (2021), an open-source conversational agent running on an intermediate server. Upon Pepper receiving the response selected by Rasa, a body language animation is performed, and an emoji matching the social context is displayed on the attached tablet, while simultaneously speaking the response back to the interacting partner. The results of the evaluation phase propose that while speech and eye gaze recognition achieve high levels of accuracy, the facial expression recognition component cannot provide the same reliability. Apart from the facial expression recognition concept proposed by SoftBank Robotics (n.d.-g), two different approaches were defined by the author of this Master thesis and require further evaluation. The overall response times of the multimodal system are kept low with Rasa requiring the majority of the time to select the appropriate response.
dc.identifier.urihttps://irf.fhnw.ch/handle/11654/40432
dc.language.isoen
dc.publisherHochschule für Wirtschaft FHNW
dc.spatialOlten
dc.subject.ddc330 - Wirtschaft
dc.titleMultimodal Human-Robot Interaction Combining Speech, Facial Expressions, and Eye Gaze
dc.type11 - Studentische Arbeit
dspace.entity.typePublication
fhnw.InventedHereYes
fhnw.PublishedSwitzerlandYes
fhnw.StudentsWorkTypeMaster
fhnw.affiliation.hochschuleHochschule für Wirtschaft
fhnw.affiliation.institutMaster of Science
relation.isMentorOfPublication64196f63-c326-4e10-935d-6776cc91354c
relation.isMentorOfPublicationff69a9ff-aabe-477a-bdde-e900fee2f7e0
relation.isMentorOfPublication.latestForDiscovery64196f63-c326-4e10-935d-6776cc91354c
Dateien