This thesis critically examines the integration of Visual Voice Activity Detection (VVAD) technology into human-robot dialogue systems, focusing on its potential to enhance communication in noisy environments. While traditional speech recognition systems excel in quiet settings, their performance declines significantly amid background noise, highlighting the need for advanced dialogue systems that can interpret speech with the aid of visual cues. This research emphasizes the application of VVAD within human robot interactions (HRI), aiming to improve speech recognition accuracy and facilitate more natural conversations across various sectors such as healthcare, education, and domestic assistance.
The conceptual and technical underpinnings of VVAD are explored, along with its practical implementation in HRI, to gauge its effectiveness in real-world scenarios. By incorporating VVAD into a dialogue system, this study conducts an extensive user evaluation to assess its impact on the quality and coherence of human-robot dialogues. The findings are expected to showcase the substantial benefits of a multimodal approach to human-robot conversations. VVAD is one of the modalities, effectively narrowing the communication gap between humans and robots.
It is important to note, however, that the challenges associated with noise and visual complexity in communication settings cannot be completely eradicated. VVAD represents a significant step forward in addressing these issues, offering improvements in the interaction dynamics between humans and robots. Nevertheless, this technology is part of an ongoing journey towards perfecting human-robot communication, rather than a definitive solution.
This thesis contributes to the field of HRI by underscoring the transformative potential of VVAD in enhancing our interactions with robots, making them more responsive to the nuances of human communication. The insights gained from this research not only advocate for the incorporation of VVAD into future robotics designs but also open avenues for further exploration into multimodal interaction systems. This work highlights the crucial role of visual cues in improving speech recognition within complex and unpredictable environments, setting the stage for more immersive and effective human-robot dialogues.
Vortragsdetails
Enhancing Human-Robot Dialogue with Visual Voice Activity Detection
In der Regel sind die Vorträge Teil von Lehrveranstaltungsreihen der Universität Bremen und nicht frei zugänglich. Bei Interesse wird um Rücksprache mit dem Sekretariat unter sek-ric(at)dfki.de gebeten.