Cross-lingual Voice Activity Detection for Human-Robot Interaction
Nils Höfling, Su-Kyoung Kim, Elsa Andrea Kirchner
In The Third Neuroadaptive Technology Conference, (NAT-2022), 09.10.-12.10.2022, Lübbenau, n.n., pages 100-103, Nov/2022.

Abstract :

The recognition of language is a two-step process: speech must be recognized as such (1,2) and then the semantics must be understood. For human-robot interaction voice activity detection (VAD) is of great importance (3). Once it is known that a human is talking, speech recognition can be triggered and additional modules in the robot can produce responses to the human, or other robotic behaviors. For online interaction with precise timing especially when using multimodal data (4), it might also be necessary to integrate VAD into a microcontroller or similar embedded system in the robot. Advanced methods exist to enable online and embedded VAD (3). However, some of these methods are trained on biased data, i.e., data in one language, usually English, which can cause problems when used in applications where the interacting human speaks a different language. This is well investigated for speech recognition 5) but poorly for VAD. Language-related issues need to be considered in some applications, such as supporting patients in non-English speaking environments, and may be as important as approaches that handle strong background noise (6, 7).

Keywords :

voice activity detection, speech recognition, embedded, robot




last updated 06.09.2016
to top