On the GAN-based image manipulations

This series of works investigate the potential of the state-of-the-art GANs to manipulate natural images. We have developed several unsupervised methods that exploit pretrained GANs for advanced semantic editing and object segmentation.
The latent spaces of GAN models often have semantically meaningful directions. Moving in these directions corresponds to human-interpretable image transformations, such as zooming or recoloring, enabling a more controllable generation process. However, the discovery of such directions is currently performed in a supervised manner, requiring human labels, pretrained models, or some form of self-supervision. These requirements severely restrict a range of directions existing approaches can discover. In this paper, we introduce an unsupervised method to identify interpretable directions in the latent space of a pretrained GAN model. By a simple model-agnostic procedure, we find directions corresponding to sensible semantic manipulations without any form of (self-)supervision. Furthermore, we reveal several non-trivial findings, which would be difficult to obtain by existing methods, e.g., a direction corresponding to background removal. As an immediate practical benefit of our work, we show how to exploit this finding to achieve competitive performance for weakly-supervised saliency detection. The implementation of our method is available online.
Unsupervised Discovery of Interpretable Directions in the GAN Latent Space

Andrey Voynov, Artem Babenko
ICML 2020

@inproceedings{voynov2020unsupervised,
  title={Unsupervised discovery of interpretable directions in the gan latent space},
  author={Voynov, Andrey and Babenko, Artem},
  booktitle={International Conference on Machine Learning},
  pages={9786--9796}, year={2020}, organization={PMLR}
}

Authors

Artem Babenko
Yandex Research

Andrey Voynov
Yandex Research

Generative Adversarial Networks (GANs) are currently an indispensable tool for visual editing, being a standard component of image-to-image translation and image restoration pipelines. Furthermore, GANs are especially useful for controllable generation since their latent spaces contain a wide range of interpretable directions, well suited for semantic editing operations. By gradually changing latent codes along these directions, one can produce impressive visual effects, unattainable without GANs. In this paper, we significantly expand the range of visual effects achievable with the state-of-the-art models, like StyleGAN2. In contrast to existing works, which mostly operate by latent codes, we discover interpretable directions in the space of the generator parameters. By several simple methods, we explore this space and demonstrate that it also contains a plethora of interpretable directions, which are an excellent source of non-trivial semantic manipulations. The discovered manipulations cannot be achieved by transforming the latent codes and can be used to edit both synthetic and real images. We release our code and models and hope they will serve as a handy tool for further efforts on GAN-based image editing.
Navigating the GAN Parameter Space for Semantic Image Editing

Anton Cherepkov, Andrey Voynov, Artem Babenko
CVPR 2021



@inproceedings{Navigan_CVPR_2021,
  title     = {Navigating the GAN Parameter Space for Semantic Image Editing},
  author    = {Cherepkov, Anton and Voynov, Andrey and Babenko, Artem},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2021}
}

Authors

Artem Babenko
Yandex Research

Andrey Voynov
Yandex Research

Anton Cherepkov
Yandex Research

Since collecting pixel-level groundtruth data is expensive, unsupervised visual understanding problems are currently an active research topic. In particular, several recent methods based on generative models have achieved promising results for object segmentation and saliency detection. However, since generative models are known to be unstable and sensitive to hyperparameters, the training of these methods can be challenging and time-consuming. In this work, we introduce an alternative, much simpler way to exploit generative models for unsupervised object segmentation. First, we explore the latent space of the BigBiGAN – the state-of-the-art unsupervised GAN, which parameters are publicly available. We demonstrate that object saliency masks for GAN-produced images can be obtained automatically with BigBiGAN. These masks then are used to train a discriminative segmentation model. Being very simple and easy-to-reproduce, our approach provides competitive performance on common benchmarks in the unsupervised scenario.
Object Segmentation Without Labels with Large-Scale Generative Models

Andrey Voynov, Stanislav Morozov, Artem Babenko
ICML 2021


@misc{voynov2020big,
      title={Big GANs Are Watching You: Towards Unsupervised Object Segmentation with Off-the-Shelf Generative Models}, 
      author={Andrey Voynov and Stanislav Morozov and Artem Babenko},
      year={2020},
      eprint={2006.04988}, archivePrefix={arXiv}, primaryClass={cs.LG}
}

Authors

Artem Babenko
Yandex Research

Stanislav Morozov
Yandex Research

Andrey Voynov
Yandex Research

Fri Jun 11 2021 20:31:13 GMT+0300 (Moscow Standard Time)