STable: Table Generation Framework for Encoder-Decoder Models
Since the output structure of database-like tables can cover a wide range of NLP tasks, we propose a framework for text-to-table neural models applicable to, e.g. extraction of line items, joint entity and relation extraction, or knowledge base population. The permutation-based decoder of our proposal is a generalized sequential method that comprehends information from all cells in the table. The training maximizes the expected log-likelihood for a table’s content across all random permutations of the factorization order. During the content inference, we exploit the model’s ability to generate cells in any order by searching over possible orderings to maximize the model’s confidence and avoid substantial error accumulation, which other sequential models are prone to. Experiments demonstrate a high practical value of the framework, which establishes state-of-the-art results on several challenging datasets, outperforming previous solutions by up to 15%.
Bio
Michał Turski is a Researcher at Snowflake, where he creates Document AI, a deep learning tool for information extraction from documents. Michał is also a fourth-year PhD student at the Adam Mickiewicz University in Poznań, where he is working on data extraction from documents with complex structures. Before that, he completed his Master’s in Data Science at the Faculty of Mathematics and Information Sciences of the Warsaw University of Technology.