Into the unknown: how to minimize the risk of models making wrong decisions
Machine learning models deployed on production often struggle to match the performance measured on the testing set that was extracted from the training data. The reason behind this can be a poor choice of the testing set that does not reflect real-life scenarios. This issue can easily result in underestimating the costs related to the deployment and maintenance of the solution, e.g. in autonomous cars or drug discovery projects where validation of each decision can be extremely expensive. In this talk, I will explain how to make the model evaluation more robust, thanks to careful data splitting. I will use chemical data as an example, and I'll also present two methods that can be used to estimate the uncertainty of model predictions, which can come in handy when dealing with out-of-distribution data.
Bio
Tomasz Danel is a data scientist at Ardigen and a Ph.D. candidate in machine learning at the Jagiellonian University. His research interests include deep learning, computer vision, and computer-aided drug design. At Ardigen, he builds image analysis pipelines in drug discovery projects, based on high-content screening. As a member of the machine learning research group GMUM, he works on incorporating molecular simulation into deep learning solutions for molecular design.