Perceiving Systems, Computer Vision

Controlling Text-to-Image Diffusion by Orthogonal Finetuning

2023

Conference Paper

ei

ps


Large text-to-image diffusion models have impressive capabilities in generating photorealistic images from text prompts. How to effectively guide or control these powerful models to perform different downstream tasks becomes an important open problem. To tackle this challenge, we introduce a principled finetuning method -- Orthogonal Finetuning (OFT), for adapting text-to-image diffusion models to downstream tasks. Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere. We find that this property is crucial for preserving the semantic generation ability of text-to-image diffusion models. To improve finetuning stability, we further propose Constrained Orthogonal Finetuning (COFT) which imposes an additional radius constraint to the hypersphere. Specifically, we consider two important finetuning text-to-image tasks: subject-driven generation where the goal is to generate subject-specific images given a few images of a subject and a text prompt, and controllable generation where the goal is to enable the model to take in additional control signals. We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.

Author(s): Qiu*, Z. and Liu*, W. and Feng, H. and Xue, Y. and Feng, Y. and Liu, Z. and Zhang, D. and Weller, A. and Schölkopf, B.
Book Title: Advances in Neural Information Processing Systems 36 (NeurIPS 2023)
Volume: 36
Pages: 79320--79362
Year: 2023
Month: December
Editors: A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine
Publisher: Curran Associates, Inc.

Department(s): Empirical Inference, Perceiving Systems
Bibtex Type: Conference Paper (conference)

Event Name: 37th Annual Conference on Neural Information Processing Systems
Event Place: New Orleans, USA

Note: *equal contribution
State: Published
URL: https://proceedings.neurips.cc/paper_files/paper/2023/file/faacb7a4827b4d51e201666b93ab5fa7-Paper-Conference.pdf

Links: Home
Code

BibTex

@conference{Qiuetal23,
  title = {Controlling Text-to-Image Diffusion by Orthogonal Finetuning},
  author = {Qiu*, Z. and Liu*, W. and Feng, H. and Xue, Y. and Feng, Y. and Liu, Z. and Zhang, D. and Weller, A. and Sch{\"o}lkopf, B.},
  booktitle = {Advances in Neural Information Processing Systems 36 (NeurIPS 2023)},
  volume = {36},
  pages = {79320--79362},
  editors = {A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine},
  publisher = {Curran Associates, Inc.},
  month = dec,
  year = {2023},
  note = {*equal contribution},
  doi = {},
  url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/faacb7a4827b4d51e201666b93ab5fa7-Paper-Conference.pdf},
  month_numeric = {12}
}