Learning dynamics of neural networks: the good, the bad and the ugly
Deep neural networks are almost always trained using a simple gradient based method such as Adam or SGD. Interestingly, in deep learning optimization and generalization are entangled. For instance, using an appropriately large learning rate can be critical for generalization performance. In the talk I will first describe our recent work on the learning dynamics of neural networks, and how studying them helps shed new light on the relationship between generalization and optimization in deep learning. In the end I will describe from a new perspective some of the open practical challenges in optimization.
Stanislaw Jastrzębski is a postdoctoral fellow at New York University and a machine learning lead in molecule.one. He obtained his PhD from Jagiellonian University advised by Prof. Jacek Tabor and Prof. Amos Storkey (University of Edinburgh). During his PhD studies he collaborated with Prof. Yoshua Bengio and Google Research. He has published and reviewed for the most important venues in machine learning (NeurIPS, ICML, ICLR). His research focuses on optimization of deep networks. He is also passionate about applications of deep learning to drug discovery.