标签归档:StyleGAN

DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort

We introduce DatasetGAN: an automatic procedure to generate massive datasets of high-quality semantically segmented images requiring minimal human effort. Current deep networks are extremely data-hungry, benefiting from training on large-scale datasets, which are time consuming to annotate. Our method relies on the power of recent GANs to generate realistic images. We show how the GAN latent code can be decoded to produce a semantic segmentation of the image. Training the decoder only needs a few labeled examples to generalize to the rest of the latent space, resulting in an infinite annotated dataset generator! These generated datasets can then be used for training any computer vision architecture just as real datasets are. As only a few images need to be manually segmented, it becomes possible to annotate images in extreme detail and generate datasets with rich object and part segmentations. To showcase the power of our approach, we generated datasets for 7 image segmentation tasks which include pixel-level labels for 34 human face parts, and 32 car parts. Our approach outperforms all semi-supervised baselines significantly and is on par with fully supervised methods, which in some cases require as much as 100x more annotated data as our method.

https://arxiv.org/abs/2104.06490

在本文中,我们提出了一种可以为语义分割任务生成大量数据的DatasetGAN.GAN的隐编码可以被解码为分割图,而训练解码器只需要少量标注数据即可.

Few-shot Semantic Image Synthesis Using StyleGAN Prior

This paper tackles a challenging problem of generating photorealistic images from semantic layouts in few-shot scenarios where annotated training pairs are hardly available but pixel-wise annotation is quite costly. We present a training strategy that performs pseudo labeling of semantic masks using the StyleGAN prior. Our key idea is to construct a simple mapping between the StyleGAN feature and each semantic class from a few examples of semantic masks. With such mappings, we can generate an unlimited number of pseudo semantic masks from random noise to train an encoder for controlling a pre-trained StyleGAN generator. Although the pseudo semantic masks might be too coarse for previous approaches that require pixel-aligned masks, our framework can synthesize high-quality images from not only dense semantic masks but also sparse inputs such as landmarks and scribbles. Qualitative and quantitative results with various datasets demonstrate improvement over previous approaches with respect to layout fidelity and visual quality in as few as one- or five-shot settings.

https://arxiv.org/abs/2103.14877

本文专注于解决在少样本的场景下利用语义分割图生成高质量图像的任务,这样的任务中取得像素级的标签往往是困难的。我们提出一个训练策略,这个策略可以使用StyleGAN生成伪标签。我们的中心想法是使用少量建立起一个StyleGAN特征到每一个语义分类的映射。通过上述映射,我们可以使用随机噪声生成无限量的伪语义分割图以训练一个编码器,这个编码器回用来控制一个预训练的StyleGAN生成器。尽管之前的方法可能会因为伪标签太粗糙而无法生成高质量的图像因为它们需要像素级对应的标签,而我们的方法可以通过密集的伪标签且稀疏的面部特征来生成高质量的图像。实验证明我们的方法在少样本或者单样本生成任务中的性能提升。

SWAGAN: A Style-based Wavelet-driven Generative Model

In recent years, considerable progress has been made in the visual quality of Generative Adversarial Networks (GANs). Even so, these networks still suffer from degradation in quality for high-frequency content, stemming from a spectrally biased architecture, and similarly unfavorable loss functions. To address this issue, we present a novel general-purpose Style and WAvelet based GAN (SWAGAN) that implements progressive generation in the frequency domain. SWAGAN incorporates wavelets throughout its generator and discriminator architectures, enforcing a frequency-aware latent representation at every step of the way. This approach yields enhancements in the visual quality of the generated images, and considerably increases computational performance. We demonstrate the advantage of our method by integrating it into the SyleGAN2 framework, and verifying that content generation in the wavelet domain leads to higher quality images with more realistic high-frequency content. Furthermore, we verify that our model’s latent space retains the qualities that allow StyleGAN to serve as a basis for a multitude of editing tasks, and show that our frequency-aware approach also induces improved downstream visual quality.

https://arxiv.org/abs/2102.06108

最近,通过GANs生成的图像质量有了显著提高。然而,对于一些高频内容,GANs生成效果还有待提高,这些问题是由网络对于特定频谱的偏差以及不合适的损失函数造成的。为了解决上述问题,我们提出了一种通用的基于风格和小波的GAN:SWAGAN用于频域的生成任务。SWAGAN在它的生成器和判别器中输入小波,强迫网络学习到频率相关的隐空间表示。这种架构可以提高生成图像的质量,并且节约算力。我们展示了把我们的方法域StyleGAN2结合在一起的优点,并且验证了在小波域上的内容生成可以生成更高质量的图片且保留真实的高频特征。另外,我们还验证了我们模型的隐空间保留了足够的特征可供StyleGAN进行后续的图像编辑任务,这充分证明了我们的频率感知的方法可以有效提升下游任务。

This Face Does Not Exist … But It Might Be Yours! Identity Leakage in Generative Models

Generative adversarial networks (GANs) are able to generate high resolution photo-realistic images of objects that “do not exist.” These synthetic images are rather difficult to detect as fake. However, the manner in which these generative models are trained hints at a potential for information leakage from the supplied training data, especially in the context of synthetic faces. This paper presents experiments suggesting that identity information in face images can flow from the training corpus into synthetic samples without any adversarial actions when building or using the existing model. This raises privacy-related questions, but also stimulates discussions of (a) the face manifold’s characteristics in the feature space and (b) how to create generative models that do not inadvertently reveal identity information of real subjects whose images were used for training. We used five different face matchers (face_recognition, FaceNet, ArcFace, SphereFace and Neurotechnology MegaMatcher) and the StyleGAN2 synthesis model, and show that this identity leakage does exist for some, but not all methods. So, can we say that these synthetically generated faces truly do not exist? Databases of real and synthetically generated faces are made available with this paper to allow full replicability of the results discussed in this work.

https://arxiv.org/abs/2101.05084

GAN被普遍认为可以生成一些高分辨率的并不存在的虚假人脸。但是,GAN的训练过程中有可能通过训练数据泄露,尤其是从生成人脸的上下文关系。本文揭示了个人的人脸信息可能可以从训练集中流入合成数据中不需要借助任何对抗操作或者使用任何模型。这让我们想问(1) 人脸的流形特征在特征空间中是如何的形式; (2) 如何构造一个生成模型而不会泄露个人信息。我们使用四个人脸匹配器(face_recognition, FaceNet, ArcFace, SphereFace and Neurotechnology MegaMatcher) 以及StyleGAN2生成模型,揭示了这种泄露存在于部分但不是所有的方法中。所以我们还能够说生成的人脸是绝对不存在的吗?本文涉及的数据集将会公开以供社区讨论。

VOGUE: Try-On by StyleGAN Interpolation Optimization

Given an image of a target person and an image of another person wearing a garment, we automatically generate the target person in the given garment. At the core of our method is a pose-conditioned StyleGAN2 latent space interpolation, which seamlessly combines the areas of interest from each image, i.e., body shape, hair, and skin color are derived from the target person, while the garment with its folds, material properties, and shape comes from the garment image. By automatically optimizing for interpolation coefficients per layer in the latent space, we can perform a seamless, yet true to source, merging of the garment and target person. Our algorithm allows for garments to deform according to the given body shape, while preserving pattern and material details. Experiments demonstrate state-of-the-art photo-realistic results at high resolution (512×512).

https://arxiv.org/abs/2101.02285

给定一个人物以及另一个人物穿着目标衣服的图片,我们可以自动地生成目标人物穿着目标衣服的图像。我们方法的核心是一个基于姿态条件的StyleGAN2隐空间插值,这样的插值可以无缝地操作目标图像地感兴趣区域,例如体型,头发以及由目标人物获得的肤色,并且能保持衣物的褶皱,材料,质地以及形状。通过自动地优化每一层的插值系数,我们可以实现无缝的由原始图像到目标图像的衣物融合。我们的算法能够让衣物适应目标体型并且同时保持衣物材料的特征。实验表明我们的模型可以取得在高分辨率图像(512*512)生成任务上的SOTA效果。

GuidedStyle: Attribute Knowledge Guided Style Manipulation for Semantic Face Editing

Although significant progress has been made in synthesizing high-quality and visually realistic face images by unconditional Generative Adversarial Networks (GANs), there still lacks of control over the generation process in order to achieve semantic face editing. In addition, it remains very challenging to maintain other face information untouched while editing the target attributes. In this paper, we propose a novel learning framework, called GuidedStyle, to achieve semantic face editing on StyleGAN by guiding the image generation process with a knowledge network. Furthermore, we allow an attention mechanism in StyleGAN generator to adaptively select a single layer for style manipulation. As a result, our method is able to perform disentangled and controllable edits along various attributes, including smiling, eyeglasses, gender, mustache and hair color. Both qualitative and quantitative results demonstrate the superiority of our method over other competing methods for semantic face editing. Moreover, we show that our model can be also applied to different types of real and artistic face editing, demonstrating strong generalization ability. 

https://arxiv.org/pdf/2012.11856v1.pdf

虽然在非条件GAN图像在高品质图像生成领域已经取得了长足进步,现在的生成过程还存在一定的缺陷,例如语义面部编辑任务。另外,在编辑面部图像的时候如何保留非编辑区域信息依然是一个挑战。在本文中,我们提出了针对语义面部编辑任务提出一种新的网络架构,GuidedStyle. 这种架构基于StyleGAN通过一个知识网络来引导图像生成过程。而且我们利用注意力机制使得StyleGAN的生成器可以自适应地选择某一层进行风格编辑。结果表明,我们提出的方法可以利用多样地标签在面部编辑任务中,包括微笑,眼睛,性别,胡子以及发色。