Real-world Sequential Decision Making with High-dimensional, Noisy Observations
In this talk, we present a general overview of our work on real-world sequential decision making. Specifically, we will focus on the area of sequential decision making with high-dimensional, noisy observations, with real-world applications ranging from robotics, recommendation systems, to dialogue management.
Our main contribution is to develop new methods to learn representations for solving high-dimensional control problems with unknown dynamics. These problems can very commonly be found in the real-world applications in which the systems often take as an input several high-dimensional (sensory) feedback. Our proposed approaches first embed the high-dimensional observations into a lower-dimensional latent representation space, estimate the latent dynamics model, then utilize this model for control in the latent space. An important open question here is how to learn a representation that is amenable to existing control algorithms? By formulating and analyzing the representation learning problem from an optimal control perspective, we establish three underlying principles that the learned representation should comprise: 1) accurate prediction in the observation space, 2) consistency between latent and observation space dynamics, and 3) low curvature in the latent space transitions. These principles naturally correspond to a novel representation learning loss function that consists of three terms: prediction, consistency, and curvature (PCC). Extensive experiments on benchmark domains demonstrate that the new variational-PCC learning algorithm benefits from significantly more stable and reproducible training, and leads to superior control performance. Beyond these paradigms we will further explore several directions of embed-to-control, which includes its connections with maximum information representation learning and model-based reinforcement learning.
We finally conclude the talk with a brief overview of our current work of representation learning in steerable language models for dialogue management.
Bio
Yinlam Chow is currently a senior research scientist at Google Research, where he primarily works on several directions of sequential decision making research as well as their applications to dialog management, recommender systems, and robotics. Previously he graduated from Stanford with a PhD degree in computational mathematics. Prior to Google Research, he worked as a research scientist at Osaro inc. and Deepmind, developing machine learning algorithms for several robotics systems, as well as deploying reinforcement learning algorithms at scale. His research areas fall under the intersection of sequential decision-making with uncertainty, reinforcement learning, and recommender systems.