What deep generative models can do with images

In a series of papers, we investigated the potential of generative adversarial networks (GANs) to manipulate natural images. Our unsupervised methods exploit pretrained GANs for advanced semantic editing and object segmentation.

Generative adversarial networks revolutionized image processing research and pushed the boundaries of machine learning for picture enhancement. Today, they are an indispensable tool for visual editing. GANs are a standard component of both image-to-image translation and image restoration pipelines.

The modern generative models take a latent code and generate a naturalistic image. These latent codes appear to have a natural geometry, and once we drag them, this induces meaningful image manipulation.

By studying latent manipulations, we found that it is possible to find a latent shift when the transformation of images becomes semantically loaded. For instance, the wheels of cars become bigger, people begin to smile, summer turns into winter, etc.

Before, researchers trained models in supervised mode, with markup simplifying transformations. The supervised setup allowed us to find merely what we expected to see. But once we follow an unsupervised setup, we let a model reveal its manipulations. This work was the first to adress the latent manipulations search in a fully unsupervised manner.

Images transformation after application of one of our methods.

First paper The link has been copied to clipboard

In 2020, we introduced an unsupervised method to identify interpretable directions in the latent space of a pretrained GAN

[1]. This was the pioneering work to address this. Being inspired by some of the semi-supervised approaches, our method builds a game a generative model plays with itself with the goal to find interpretable directions. This work has revealed a huge variety of different manipulations a generative model may produce.

Some of the image transformations discovered by our method.

Second paper The link has been copied to clipboard

In the following paper, we expand the range of visual effects achievable with state-of-the-art models like StyleGAN2

[2]. We showed that it is possible to manipulate images by modifying the generator parameters instead of the latents. Our method uses the interpretable directions in the space of the generator parameters for semantic editing.

Some of the image manipulation achieved by a generative model weights change. Notably, some of them are unreachable by latent code shifts.

The new approach performs well both for synthesized images and real-life photographs. For example, the model changes thickness of a horse on a photo. Even more surprisingly it also works for models without latent space at all!

Third paper The link has been copied to clipboard

With our next paper, we continued to investigate the potential benefits generative models may bring to discriminative tasks

[3]. Here we were focused on the unsupervised object segmentation task. We demonstrated that big unsupervised models may handle complex object segmentation tasks, requiring neither pixel-level nor image-level labelling. The unsupervised GANs allow distinguishing foreground and background pixels, providing good saliency masks.

This picture shows how we generate saliency masks. The model allows us to synthesize a pair of images with the same content and a highlighted object on one of them. A simple color intensity comparison gives us the mask.

Exploring images generated by the BigBiGAN model, we discovered that latent space semantics automatically create the saliency masks for synthetic images. This synthetic data serves as a source of supervision for discriminative computer vision models. The method also performs better than existing unsupervised competitors.

Conclusion The link has been copied to clipboard

Generative models produce a low-dimensional image manifold parametrization. Its geometry links with meaningful image transformations and reveals many applications.

What deep generative models can do with images

First paper The link has been copied to clipboard

Second paper The link has been copied to clipboard

Third paper The link has been copied to clipboard

Conclusion The link has been copied to clipboard

References