What deep generative models can do with images
In a series of papers, we investigated the potential of generative adversarial networks (GANs) to manipulate natural images. Our unsupervised methods exploit pretrained GANs for advanced semantic editing and object segmentation.
Generative adversarial networks revolutionized image processing research and pushed the boundaries of machine learning for picture enhancement. Today, they are an indispensable tool for visual editing. GANs are a standard component of both image-to-image translation and image restoration pipelines.
By studying latent manipulations, we found that it is possible to find a latent shift when the transformation of images becomes semantically loaded. For instance, the wheels of cars become bigger, people begin to smile, summer turns into winter, etc.
Before, researchers trained models in supervised mode, with markup simplifying transformations. The supervised setup allowed us to find merely what we expected to see. But once we follow an unsupervised setup, we let a model reveal its manipulations. This work was the first to adress the latent manipulations search in a fully unsupervised manner.
First paper The link has been copied to clipboard
In 2020, we introduced an unsupervised method to identify interpretable directions in the latent space of a pretrained GAN [1]. This was the pioneering work to address this. Being inspired by some of the semi-supervised approaches, our method builds a game a generative model plays with itself with the goal to find interpretable directions. This work has revealed a huge variety of different manipulations a generative model may produce.
Second paper The link has been copied to clipboard
In the following paper, we expand the range of visual effects achievable with state-of-the-art models like StyleGAN2 [2]. We showed that it is possible to manipulate images by modifying the generator parameters instead of the latents. Our method uses the interpretable directions in the space of the generator parameters for semantic editing.
The new approach performs well both for synthesized images and real-life photographs. For example, the model changes thickness of a horse on a photo. Even more surprisingly it also works for models without latent space at all!
Third paper The link has been copied to clipboard
With our next paper, we continued to investigate the potential benefits generative models may bring to discriminative tasks [3]. Here we were focused on the unsupervised object segmentation task. We demonstrated that big unsupervised models may handle complex object segmentation tasks, requiring neither pixel-level nor image-level labelling. The unsupervised GANs allow distinguishing foreground and background pixels, providing good saliency masks.
Exploring images generated by the BigBiGAN model, we discovered that latent space semantics automatically create the saliency masks for synthetic images. This synthetic data serves as a source of supervision for discriminative computer vision models. The method also performs better than existing unsupervised competitors.
Conclusion The link has been copied to clipboard
Generative models produce a low-dimensional image manifold parametrization. Its geometry links with meaningful image transformations and reveals many applications.
References
- Andrey Voynov, Artem Babenko. Unsupervised Discovery of Interpretable Directions in the GAN Latent Space. ICML 2020.←
- Anton Cherepkov, Andrey Voynov, Artem Babenko. Navigating the GAN Parameter Space for Semantic Image Editing. CVPR 2021.←
- Andrey Voynov, Stanislav Morozov, Artem Babenko. Object Segmentation Without Labels with Large-Scale Generative Models. ICML 2021.←