Few-shot Semantic Image Synthesis Using StyleGAN Prior

This paper tackles a challenging problem of generating photorealistic images from semantic layouts in few-shot scenarios where annotated training pairs are hardly available but pixel-wise annotation is quite costly. We present a training strategy that performs pseudo labeling of semantic masks using the StyleGAN prior. Our key idea is to construct a simple mapping between the StyleGAN feature and each semantic class from a few examples of semantic masks. With such mappings, we can generate an unlimited number of pseudo semantic masks from random noise to train an encoder for controlling a pre-trained StyleGAN generator. Although the pseudo semantic masks might be too coarse for previous approaches that require pixel-aligned masks, our framework can synthesize high-quality images from not only dense semantic masks but also sparse inputs such as landmarks and scribbles. Qualitative and quantitative results with various datasets demonstrate improvement over previous approaches with respect to layout fidelity and visual quality in as few as one- or five-shot settings.

https://arxiv.org/abs/2103.14877

本文专注于解决在少样本的场景下利用语义分割图生成高质量图像的任务,这样的任务中取得像素级的标签往往是困难的。我们提出一个训练策略,这个策略可以使用StyleGAN生成伪标签。我们的中心想法是使用少量建立起一个StyleGAN特征到每一个语义分类的映射。通过上述映射,我们可以使用随机噪声生成无限量的伪语义分割图以训练一个编码器,这个编码器回用来控制一个预训练的StyleGAN生成器。尽管之前的方法可能会因为伪标签太粗糙而无法生成高质量的图像因为它们需要像素级对应的标签,而我们的方法可以通过密集的伪标签且稀疏的面部特征来生成高质量的图像。实验证明我们的方法在少样本或者单样本生成任务中的性能提升。

发表评论

邮箱地址不会被公开。 必填项已用*标注