VeryHuman
Learning and Verifying Complex Behaviours for Humanoid Robots
The validation of systems based on deep learning for use in safety-critical applications proves to be inherently difficult, since their sub symbolic mode of operation does not provide adequate levels of abstraction for representation and proof of correctness. The VeryHuman project aims to synthesize such levels of abstraction by observing and analysing the behaviour of upright walking of a two-legged humanoid robot. The theory to be developed is the starting point for the definition of an appropriate reward function to optimally control the movements of the humanoid by means of enhanced learning, as well as for verifiable abstraction of the corresponding kinematic models, which can be used to validate the behaviour of the robot more easily.
Project details
- A robust hardware of the robot along with an accurate simulation of the system is required. For example, the robot can be subjected to a large amount of holonomic constraints including internal closed loops and external contacts which pose challenges to the accuracy of the simulation.
- Second, control algorithms of this kind can be hard to implement due to lack of knowledge of reward and constraints. As an example, consider the upright walking movement for a two-legged humanoid robot. It is not immediately clear how one can specify the task of “upright walking”. We might try to relate different body parts (head above shoulders, shoulders above waist, waist above legs), use physical stability criteria (centre of pressure, zero moment point etc), but do these really specify walking and what are non-trivial properties? This leads to the non-trivial task of defining a suitable reward function for (deep) reinforcement learning approaches or cost function for optimal control approaches along with constraints.
- How can we formulate and prove properties of a complex humanoid robot, and
- how can we efficiently combine reinforcement learning and optimal control-based approaches, and
- how can we make use of symbolic properties to derive a reward function in a deep reinforcement learning approach or optimal control approach for complex use cases such as humanoid walking.