标签归档:GANs

On Buggy Resizing Libraries and Surprising Subtleties in FID Calculation

We investigate the sensitivity of the Fréchet Inception Distance (FID) score to inconsistent and often incorrect implementations across different image processing libraries. FID score is widely used to evaluate generative models, but each FID implementation uses a different low-level image processing process. Image resizing functions in commonly-used deep learning libraries often introduce aliasing artifacts. We observe that numerous subtle choices need to be made for FID calculation and a lack of consistencies in these choices can lead to vastly different FID scores. In particular, we show that the following choices are significant: (1) selecting what image resizing library to use, (2) choosing what interpolation kernel to use, (3) what encoding to use when representing images. We additionally outline numerous common pitfalls that should be avoided and provide recommendations for computing the FID score accurately. We provide an easy-to-use optimized implementation of our proposed recommendations in the accompanying code.

https://arxiv.org/abs/2104.11222

我们发现FID的敏感度会因为在不同图像处理库下开发出现偏差。虽然FID是一个被广泛使用的标准用于评价生成模型,但是它在不同的库中使用不同的方式开发的。我们观察到图像缩放操作在深度学习应用中会引入混淆失真。这就说明我们需要为FID的计算提供多个选择以防止上述缩放操作引入的失真,(1)选择使用哪种库进行图像缩放;(2)选择使用哪种插值核进行缩放;(3)选择使用哪种编码方式保存图像。

DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort

We introduce DatasetGAN: an automatic procedure to generate massive datasets of high-quality semantically segmented images requiring minimal human effort. Current deep networks are extremely data-hungry, benefiting from training on large-scale datasets, which are time consuming to annotate. Our method relies on the power of recent GANs to generate realistic images. We show how the GAN latent code can be decoded to produce a semantic segmentation of the image. Training the decoder only needs a few labeled examples to generalize to the rest of the latent space, resulting in an infinite annotated dataset generator! These generated datasets can then be used for training any computer vision architecture just as real datasets are. As only a few images need to be manually segmented, it becomes possible to annotate images in extreme detail and generate datasets with rich object and part segmentations. To showcase the power of our approach, we generated datasets for 7 image segmentation tasks which include pixel-level labels for 34 human face parts, and 32 car parts. Our approach outperforms all semi-supervised baselines significantly and is on par with fully supervised methods, which in some cases require as much as 100x more annotated data as our method.

https://arxiv.org/abs/2104.06490

在本文中,我们提出了一种可以为语义分割任务生成大量数据的DatasetGAN.GAN的隐编码可以被解码为分割图,而训练解码器只需要少量标注数据即可.

HumanGAN: A Generative Model of Humans Images

Generative adversarial networks achieve great performance in photorealistic image synthesis in various domains, including human images. However, they usually employ latent vectors that encode the sampled outputs globally. This does not allow convenient control of semantically-relevant individual parts of the image, and is not able to draw samples that only differ in partial aspects, such as clothing style. We address these limitations and present a generative model for images of dressed humans offering control over pose, local body part appearance and garment style. This is the first method to solve various aspects of human image generation such as global appearance sampling, pose transfer, parts and garment transfer, and parts sampling jointly in a unified framework. As our model encodes part-based latent appearance vectors in a normalized pose-independent space and warps them to different poses, it preserves body and clothing appearance under varying posture. Experiments show that our flexible and general generative method outperforms task-specific baselines for pose-conditioned image generation, pose transfer and part sampling in terms of realism and output resolution.

https://arxiv.org/abs/2103.06902

生成对抗网络将图像生成拓展许多应用中并取得了良好的反响。但是,它们往往使用隐矢量对采样输出进行编码,这使得对于独立部分的编辑工作变得很不方便,也无法控制部分单独变量例如服饰的风格。我们通过提出一个新的生成模型来解决这个问题,我们提出的模型可以控制姿态,局部身体部位以及服装风格。这是第一个从多方面解决人体图像生成的方法,它由全局外观采样,姿态转移,部位和服饰转移,以及部位联合采样几个部分组成。当我们的编码器编码完成隐外观向量到一个标准化的姿态无关的空间之后我们将它映射到不同的姿态,这不会影响身体和服饰的外观。实验表明我们的模型在条件图像生成,姿态转移以及部分采样等任务中获得了优异的性能。

Training Generative Adversarial Networks in One Stage

Generative Adversarial Networks (GANs) have demonstrated unprecedented success in various image generation tasks. The encouraging results, however, come at the price of a cumbersome training process, during which the generator and discriminator are alternately updated in two stages. In this paper, we investigate a general training scheme that enables training GANs efficiently in only one stage. Based on the adversarial losses of the generator and discriminator, we categorize GANs into two classes, Symmetric GANs and Asymmetric GANs, and introduce a novel gradient decomposition method to unify the two, allowing us to train both classes in one stage and hence alleviate the training effort. Computational analysis and experimental results on several datasets and various network architectures demonstrate that, the proposed one-stage training scheme yields a solid 1.5× acceleration over conventional training schemes, regardless of the network architectures of the generator and discriminator. Furthermore, we show that the proposed method is readily applicable to other adversarial-training scenarios, such as data-free knowledge distillation.

https://arxiv.org/pdf/2103.00430.pdf

生成对抗网络(GANs)已经在不同的图像生成任务中展示了它史无前例的成功。但是这样的成功是来自于复杂的训练流程,这样的训练流程由生成器和判别器通过两阶段交替更新完成。在本文中,我们提出了一种GANs单步训练流程。根据对抗损失类型分类,我们把GANs分成对称GANs和非对称GANs两种,同时我们还提出了一种新的梯度分解方法去统一两种GANs使得我们可以在单步内完成训练。计算量分析和实验结果表明单步训练的GANs可以得到1.5倍的加速。