M.Sc. Thesis Offer: Policy Shaping in Interactive Reinforcement Learning for Dynamical System using low-dimensional feedback signals

As shown in multiple research, providing feedback to autonomous learning agents can speed up learning. However, the quantification/characterization of different aspects of feedback such as feedback quantity, quality, temporal and spatial misalignments, etc., in learning speed, performance and other relevant metrics is still an open question. This question does not only addresses theoretical aspects of the learning algorithms, but it is also very relevant for application in real systems because although feedback is beneficial, (human-)feedback is also expensive and adds complexity to the systems. Thus, it is essential to know the minimal requirements for the (human-)feedback to achieve a significant increment in performance, learning speed etc., that it is worth the added complexity.

To address these challenges, this project presents a series of questions that can be addressed independently towards achieving a deeper understanding of the role of feedback in a learning system's performance.Several assumptions and simplifications have been made to facilitate the study of these questions. These include the use of binary and low-dimensional feedback comparable to those used in the M-RoCK project. Also motivated by the M-RoCK project, this project will be studied in a robot reaching task for a KUKA LBR iiwa, a robotic arm of 7 degrees of freedom (DoF). Configurations from 1 to 7 DoF will be used to study feedback effects at different levels of task complexity. This project will use artificial feedback and primarily be studied in simulated environments. Eventually, once a better understanding of the effects of feedback is obtained, experiments with real users will be carried out.

Several thesis directions are possible, and these will be discussed with the candidates. These include:

Quantifying the Effect of Feedback Quantity in IRL Performance

Quantifying the Effect of Feedback Accuracy in IRL Performance

Quantifying the Effect of Time-Delayed Feedback in IRL Performance

Policy Shaping for Dynamical System using low-dimensional feedback

Prior Knowledge or interest in:

Reinforcement Learning and Machine Learning

Human-Robot Interaction

Python, Latex, git, Linux

Related Work:

Stahlhut, C., Navarro-Guerrero, N., Weber, C., & Wermter, S. (2015). Interaction in Reinforcement Learning Reduces the Need for Finely Tuned Hyperparameters in Complex Tasks. Kognitive Systeme, 3(2).


Deutsches Forschungszentrum für

Künstliche Intelligenz GmbH

Robotics Innovation Center

Robert-Hooke-Str. 1

28359 Bremen, Germany

Phone: +49 421 17845 4119

last updated 18.11.2019
to top