“Projected Compression: Trainable Projections for Efficient Transformer Compression”
Large language models have steadily increased in size to achieve improved performance; however, this growth has also led to greater inference time and computational demands. Consequently, there is rising interest in model size reduction methods. To address this issue, we propose Projected Compression, a novel model compression technique, that reduces model weights by utilizing projection modules. Specifically, we first train additional projection weights and preserve access to all the original model parameters. Subsequently, these projections are combined into a lower-dimensional product matrix, resulting in a reduced-size standard Transformer-based model. Unlike alternative approaches that require additional computational overhead, our method matches the per-token computation cost of training a compressed model. Experimental results show that Projected Compression performs especially well with increasing compression rates as high as 90% compared to other compression methods.
Bio
Michał Krutul is a PhD student at the Doctoral School of Natural Sciences at the University of Warsaw. As part of his doctorate, he conducts research on improving the efficiency of large language models. He has published work in this field at top machine learning conferences (NeurIPS, ICML).
Before diving into his doctorate, he wore quite a few hats: Machine Learning Engineer (building clever models), DevOps Engineer (taming servers and speeding up deployments), Software Engineer (making things actually work), and Frontend Engineer (making things look good while they worked). Basically, if it involved code, he probably poked at it, fixed it, or made it faster.
