This research proposal addresses the pressing need for efficient human-robot communication channels in the context of visual interactions. The motivation behind this work is the growing significance of human-robot interaction and its potential to revolutionize various aspects of daily life. Although speech recognition and communication technologies have advanced significantly, it is still difficult to smoothly integrate them into visual environments and capture the details of human conversation.
The fundamental problem tackled in this research goes beyond basic chatbot interactions, focusing on the complex interplay of speech and visual modalities to enable insightful and complex conversations. Specifically, this study aims to fill a knowledge gap by evaluating how Visual Voice Activity Detection (VVAD) technology enhances human-robot dialogues in real-world scenarios, especially when visual cues are crucial.
The primary goal of this thesis is to investigate the effectiveness of integrating VVAD into human-robot interactions. This involves developing and implementing a sophisticated dialogue system within a robotic framework and conducting an extensive user study. The study's core objective is to assess how VVAD technology improves the quality and effectiveness of human-robot dialogues, offering valuable insights into the practical benefits and potential transformative impact on human-robot interaction.
This research's outcome promises to contribute to the advancement of human-robot interaction technology by exploring the potential of VVAD to bridge communication gaps in real-world settings. Ultimately, the insights gained will inform the development of more effective and user-friendly human-robot communication systems, with broad applications across industries and daily life.