What deep generative models can do with images

July 6, 2022

By Andrey Voynov

In a series of papers, we investigated the potential of generative adversarial networks (GANs) to manipulate natural images. Our unsupervised methods exploit pretrained GANs for advanced semantic editing and object segmentation.

Generative adversarial networks revolutionized image processing research and pushed the boundaries of machine learning for picture enhancement. Today, they are an indispensable tool for visual editing. GANs are a standard component of both image-to-image translation and image restoration pipelines.

 

The modern generative models take a latent code and generate a naturalistic image. These latent codes appear to have a natural geometry, and once we drag them, this induces meaningful image manipulation.

By studying latent manipulations, we found that it is possible to find a latent shift when the transformation of images becomes semantically loaded. For instance, the wheels of cars become bigger, people begin to smile, summer turns into winter, etc.

Before, researchers trained models in supervised mode, with markup simplifying transformations. The supervised setup allowed us to find merely what we expected to see. But once we follow an unsupervised setup, we let a model reveal its manipulations. This work was the first to adress the latent manipulations search in a fully unsupervised manner.

Images transformation after application of one of our methods.

First paper

In 2020, we introduced an unsupervised method to identify interpretable directions in the latent space of a pretrained GAN [1]. This was the pioneering work to address this. Being inspired by some of the semi-supervised approaches, our method builds a game a generative model plays with itself with the goal to find interpretable directions. This work has revealed a huge variety of different manipulations a generative model may produce.

Some of the image transformations discovered by our method.

Second paper

In the following paper, we expand the range of visual effects achievable with state-of-the-art models like StyleGAN2 [2]. We showed that it is possible to manipulate images by modifying the generator parameters instead of the latents. Our method uses the interpretable directions in the space of the generator parameters for semantic editing.

Some of the image manipulation achieved by a generative model weights change. Notably, some of them are unreachable by latent code shifts.

The new approach performs well both for synthesized images and real-life photographs. For example, the model changes thickness  of a horse on a photo. Even more surprisingly it also works for models without latent space at all!

Third paper

With our next paper, we continued to investigate the potential benefits generative models may bring to discriminative tasks [3]. Here we were focused on the unsupervised object segmentation task. We demonstrated that big unsupervised models may handle complex object segmentation tasks, requiring neither pixel-level nor image-level labelling. The unsupervised GANs allow distinguishing foreground and background pixels, providing good saliency masks.

This picture shows how we generate saliency masks. The model allows us to synthesize a pair of images with the same content and a highlighted object on one of them. A simple color intensity comparison gives us the mask.

Exploring images generated by the BigBiGAN model, we discovered that latent space semantics automatically create the saliency masks for synthetic images. This synthetic data serves as a source of supervision for discriminative computer vision models. The method also performs better than existing unsupervised competitors.

Conclusion

Generative models produce a low-dimensional image manifold parametrization. Its geometry links with meaningful image transformations and reveals many applications.

[1] Andrey Voynov, Artem Babenko. Unsupervised Discovery of Interpretable Directions in the GAN Latent Space. ICML 2020.
[2] Anton Cherepkov, Andrey Voynov, Artem Babenko. Navigating the GAN Parameter Space for Semantic Image Editing. CVPR 2021.
[3] Andrey Voynov, Stanislav Morozov, Artem Babenko. Object Segmentation Without Labels with Large-Scale Generative Models. ICML 2021.