“Controlling Generative Models through Mechanistic Localization”
Understanding and controlling generative models is essential for aligning their outputs with human intent. But what if I tell you that such control can be achieved using less than 1% of the parameters? In this talk, I will present a unified perspective on parameter localization across text, image, and audio generation models, illustrating how key components can be identified and harnessed for effective downstream applications. Building on our ICLR 2025 paper, which shows that only a small percentage of diffusion models’ parameters govern textual content in image generation, I will demonstrate how precise localization and modulation of these layers enables fine-grained image editing, efficient fine-tuning, and robust mitigation of undesired text generations. Then, I will introduce our follow-up work for audio generation models, where we identify functional components responsible for controlling musical attributes—such as tempo, instrumentation, and vocal style—through patching of individual cross-attention layers.
Bio
Łukasz Staniszewski is a PhD student at the Warsaw University of Technology and an AI researcher at the IDEAS Research Institute. His primary interest lies in understanding how generative models work underneath to enable more effective control over them. In the past, he has published at ICLR, and his professional experience includes roles at the Samsung R&D Institute and CISPA.
