Hindsight Off-policy Options – Disentangling Abstractions in the Option Framework
Deep reinforcement learning has seen numerous successes in recent years (e.g. Go, Chess, Atari, StarCraft), but still faces challenges in domains with limited data access.
We will discuss the perspective of using hierarchy and abstractions to address the challenges and improve data efficiency. In particular, we’ll look into a simple optimisation scheme which can easily be applied to train hierarchical policies. While optimising hierarchical agents, we will further investigate one of the most common methods for hierarchical RL: the options framework. At its core, the options framework divides an agent into a combination of low-level and high-level controllers and introduces a form of action abstraction. It effectively reduces the high-level controller’s task to choosing from a discrete set of reusable sub-policies. It also enables temporal abstraction by explicitly modelling the temporal continuation of low-level behaviors. We’ll dive into what decisions help with robustness and data efficiency and further work to disentangle the benefits of both temporal and action abstraction.
Bio
Markus Wulfmeier is a research scientist at Google DeepMind focusing on efficient machine learning via decomposition and modularity as well the different dimensions of learning to transfer information, including but not restricted to sim2real, domain adaptation and extraction of knowledge from demonstration.
Most recently, he has been a postdoctoral research scientist at the Oxford Robotics Institute as well as a member of Oxford University’s New College. In 2017, he was a visiting scholar with the UC Berkeley Artificial Intelligence Research lab.
The principal focus of his PhD research was the development of approaches for increasing the efficiency of processes for providing supervision to guide autonomous systems with particular emphasis on transfer learning and learning from demonstration, work which was awarded as Best Student Paper at IROS16.
Furthermore in early 2016, he was fortunate to lead ORIs path planning software development for the presentation of a self driving prototype at the Shell Eco Marathon (SEM). This work has paved the way for the introduction of a new autonomous challenge category at the SEM scheduled for 2018.
Being in the field of robotics since 2010, he has been previously part of research efforts on space exploration robots, GPU-based simulations and robotic platforms for first responders as well as mobile autonomy at leading research institutions including MIT, ETHZ and the University of Oxford.