标签归档:GAN

TransGAN: Two Transformers Can Make One Strong GAN

The recent explosive interest on transformers has suggested their potential to become powerful “universal” models for computer vision tasks, such as classification, detection, and segmentation. However, how further transformers can go – are they ready to take some more notoriously difficult vision tasks, e.g., generative adversarial networks (GANs)?Driven by that curiosity, we conduct the first pilot study in building a GAN completely free of convolutions, using only pure transformer-based architectures. Our vanilla GAN architecture, dubbed TransGAN, consists of a memory-friendly transformer-based generator that progressively increases feature resolution while decreasing embedding dimension, and a patch-level discriminator that is also transformer-based. We then demonstrate TransGAN to notably benefit from data augmentations (more than standard GANs), a multi-task co-training strategy for the generator, and a locally initialized self-attention that emphasizes the neighborhood smoothness of natural images. Equipped with those findings, TransGAN can effectively scale up with bigger models and high-resolution image datasets. Specifically, our best architecture achieves highly competitive performance compared to current state-of-the-art GANs based on convolutional backbones. Specifically, TransGAN sets new state-of-the-art IS score of 10.10 and FID score of 25.32 on STL-10. It also reaches competitive 8.64 IS score and 11.89 FID score on Cifar-10, and 12.23 FID score on CelebA 64×64, respectively. We also conclude with a discussion of the current limitations and future potential of TransGAN.

https://arxiv.org/abs/2102.07074v2

最近关于transformer的爆发式的关注证明了它有在例如分类,检测或者分割等计算机视觉任务上成为通用模型的潜力。但是,transformer可以走多远呢?它能够解决例如GANs等一些困难的视觉任务了吗?好奇心驱使我们完成了第一个完全非卷积的GAN,这个GAN完全由transformer构成。我们的GAN架构被成为TransGAN. 它可以分为以下几个部分:内存友好的基于transformer的生成器,这个生成器通过渐进式地提升特征分辨率且降低特征的尺寸。一个patch级别的基于transformer的判别器。然后我们展示了TransGAN相对与其他的GANs能够更好地利用数据增广来提升性能。我们还提出了一个多任务的联合训练策略以更好地训练生成器,使得生成器可以用过局部自注意力机制感知图像的邻域平滑度。通过以上的发现,TransGAN得以适应更大且更高清的数据集。实验证明TransGAN拥有SOTA的性能。

GuidedStyle: Attribute Knowledge Guided Style Manipulation for Semantic Face Editing

Although significant progress has been made in synthesizing high-quality and visually realistic face images by unconditional Generative Adversarial Networks (GANs), there still lacks of control over the generation process in order to achieve semantic face editing. In addition, it remains very challenging to maintain other face information untouched while editing the target attributes. In this paper, we propose a novel learning framework, called GuidedStyle, to achieve semantic face editing on StyleGAN by guiding the image generation process with a knowledge network. Furthermore, we allow an attention mechanism in StyleGAN generator to adaptively select a single layer for style manipulation. As a result, our method is able to perform disentangled and controllable edits along various attributes, including smiling, eyeglasses, gender, mustache and hair color. Both qualitative and quantitative results demonstrate the superiority of our method over other competing methods for semantic face editing. Moreover, we show that our model can be also applied to different types of real and artistic face editing, demonstrating strong generalization ability. 

https://arxiv.org/pdf/2012.11856v1.pdf

虽然在非条件GAN图像在高品质图像生成领域已经取得了长足进步,现在的生成过程还存在一定的缺陷,例如语义面部编辑任务。另外,在编辑面部图像的时候如何保留非编辑区域信息依然是一个挑战。在本文中,我们提出了针对语义面部编辑任务提出一种新的网络架构,GuidedStyle. 这种架构基于StyleGAN通过一个知识网络来引导图像生成过程。而且我们利用注意力机制使得StyleGAN的生成器可以自适应地选择某一层进行风格编辑。结果表明,我们提出的方法可以利用多样地标签在面部编辑任务中,包括微笑,眼睛,性别,胡子以及发色。

Navigating the GAN Parameter Space for Semantic Image Editing

An image

Generative Adversarial Networks (GANs) are currently an indispensable tool for visual editing, being a standard component of image-to-image translation and image restoration pipelines. Furthermore, GANs are especially useful for controllable generation since their latent spaces contain a wide range of interpretable directions, well suited for semantic editing operations. By gradually changing latent codes along these directions, one can produce impressive visual effects, unattainable without GANs. 
In this paper, we significantly expand the range of visual effects achievable with the state-of-the-art models, like StyleGAN2. In contrast to existing works, which mostly operate by latent codes, we discover interpretable directions in the space of the generator parameters. By several simple methods, we explore this space and demonstrate that it also contains a plethora of interpretable directions, which are an excellent source of non-trivial semantic manipulations. The discovered manipulations cannot be achieved by transforming the latent codes and can be used to edit both synthetic and real images. We release our code and models and hope they will serve as a handy tool for further efforts on GAN-based image editing.

https://arxiv.org/abs/2011.13786

GAN在视觉编辑领域已经成为不可或缺的工具,在图像到图像翻译以及图像复原领域已经称为一个标准组件。另外,GAN在可控制生成任务中尤其有价值,因为他的隐空间包含了广泛的可解释方向,这些可解释方向可以被语义编辑任务有效地利用。通过逐步地改变隐空间的编码的这些方向,我们可以制造惊人的视觉效果,这是一般的GAN无法做到的。在本文中,我们极大地拓展了现有模型的应用领域,如StyleGAN2。不像绝大多数的模型,我们探究在生成器的参数中探究可解释的方向。我们利用几个简单的方法就可以探究这个空间并且展示了这个空间里有足够的可解释方向,这些可解释方向是可以被语义编辑利用的资源。我们发现,语义编辑不能通过隐空间编码的方法达成,它可被用于合成以及真实图像。

Teaching a GAN What Not to Learn

Generative adversarial networks (GANs) were originally envisioned as unsupervised generative models that learn to follow a target distribution. Variants such as conditional GANs, auxiliary-classifier GANs (ACGANs) project GANs on to supervised and semi-supervised learning frameworks by providing labelled data and using multi-class discriminators. In this paper, we approach the supervised GAN problem from a different perspective, one that is motivated by the philosophy of the famous Persian poet Rumi who said, “The art of knowing is knowing what to ignore.” In the GAN framework, we not only provide the GAN positive data that it must learn to model, but also present it with so-called negative samples that it must learn to avoid – we call this “The Rumi Framework.” This formulation allows the discriminator to represent the underlying target distribution better by learning to penalize generated samples that are undesirable – we show that this capability accelerates the learning process of the generator. We present a reformulation of the standard GAN (SGAN) and least-squares GAN (LSGAN) within the Rumi setting. The advantage of the reformulation is demonstrated by means of experiments conducted on MNIST, Fashion MNIST, CelebA, and CIFAR-10 datasets. Finally, we consider an application of the proposed formulation to address the important problem of learning an under-represented class in an unbalanced dataset. The Rumi approach results in substantially lower FID scores than the standard GAN frameworks while possessing better generalization capability.

https://arxiv.org/pdf/2010.15639.pdf

本文提出一种通过惩罚生成不理想生成结果来训练的GAN架构,这种架构可以反向地去学习不去生成什么而不是正向地去学习去生成什么。

Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

PDF] Learning a Probabilistic Latent Space of Object Shapes via 3D  Generative-Adversarial Modeling | Semantic Scholar

We study the problem of 3D object generation. We propose a novel framework, namely 3D Generative Adversarial Network (3D-GAN), which generates 3D objects from a probabilistic space by leveraging recent advances in volumetric convolutional networks and generative adversarial nets. The benefits of our model are three-fold: first, the use of an adversarial criterion, instead of traditional heuristic criteria, enables the generator to capture object structure implicitly and to synthesize high-quality 3D objects; second, the generator establishes a mapping from a low-dimensional probabilistic space to the space of 3D objects, so that we can sample objects without a reference image or CAD models, and explore the 3D object manifold; third, the adversarial discriminator provides a powerful 3D shape descriptor which, learned without supervision, has wide applications in 3D object recognition. Experiments demonstrate that our method generates high-quality 3D objects, and our unsupervisedly learned features achieve impressive performance on 3D object recognition, comparable with those of supervised learning methods.

http://3dgan.csail.mit.edu/papers/3dgan_nips.pdf

我们在本文中研究3D目标生成问题。我们提出了一种名为3D-GAN的三维生成对抗网络模型,它可以通过一个立体的网络模型从概率空间生成3D目标。我们提出的模型有三个优点:(1) 利用对抗评价标准而非传统的启发式评价标准,这使得网络可以隐式地捕捉物体的结构并且生成高质量的3D目标;(2) 生成器建立一个从低维概率空间到3D目标空间的映射,因此我们可以无需样例图像或者CAD模型就可以生成3D目标; (3) 对抗判别器可以作为一个有效的3D建模描述器,它可以无监督地训练并且被用于3D目标识别任务中。实验表明我们的方法可以有效地生成高质量的3D目标,我们的无监督特征在3D目标识别任务中相对于有监督的特征取得了良好的成绩。

Resolution Dependant GAN Interpolation for Controllable Image Synthesis Between Domains

NeurIPS 2020 Workshop | Indie GAN Interpolation Method Turns Selfies Into  Cartoon Characters | Synced

GANs can generate photo-realistic images from the domain of their training data. However, those wanting to use them for creative purposes often want to generate imagery from a truly novel domain, a task which GANs are inherently unable to do. It is also desirable to have a level of control so that there is a degree of artistic direction rather than purely curation of random results. Here we present a method for interpolating between generative models of the StyleGAN architecture in a resolution dependant manner. This allows us to generate images from an entirely novel domain and do this with a degree of control over the nature of the output.

https://arxiv.org/pdf/2010.05334v1.pdf

GANs可以从训练集域生成逼真的图像。但是如果针对创造性的工作,往往需要从真正新的域生成图像,这是GANs无法做到的。而且还需要对于创作风格有所控制而非单纯生成随机的结果。在本文中,我们提出了一种在基于StyleGAN生成模型之间分辨率依赖插值方法。这种方法可以让我们从全新域生成图像,并且能够控制自然的输出。

EXPOSING GAN-GENERATED FACES USING INCONSISTENT CORNEAL SPECULAR HIGHLIGHTS

Exposing GAN-generated Faces Using Inconsistent Corneal Specular Highlights

Sophisticated generative adversary network (GAN) models are now able to synthesize highly realistic human faces that are difficult to discern from real ones visually. GAN synthesized faces have become a new form of online disinformation. In this work, we show that GAN synthesized faces can be exposed with the inconsistent corneal specular highlights between two eyes. We show that such artifacts exist widely and further describe a method to extract and compare corneal specular highlights from two eyes. Qualitative and quantitative evaluations of our method suggest its simplicity and effectiveness in distinguishing GAN synthesized faces.

https://arxiv.org/pdf/2009.11924.pdf

最近的研究中,GAN可以生成拟真度极高的人脸图像,这些人脸图像难以从视觉层面分辨真假。而且现在这些虚假的人脸图像开始渐渐成为网络的虚假信息。在本文中,我们发现GAN合成的虚假人脸图像可以通过双眼不一致的角膜反光进行识别。我们发现这些人为痕迹广泛存在于通过GAN合成的虚假图片,并且我们还提出一种方法提取并且比较双眼的角膜高光。实验证明我们的方法可以有效地识别GAN合成的人脸图像。

not-so-BigGAN: Generating High-Fidelity Images on a Small Compute Budget

BigGAN is the state-of-the-art in high-resolution image generation, successfully leveraging advancements in scalable computing and theoretical understanding of generative adversarial methods to set new records in conditional image generation. A major part of BigGAN’s success is due to its use of large mini-batch sizes during training in high dimensions. While effective, this technique requires an incredible amount of compute resources and/or time (256 TPU-v3 Cores), putting the model out of reach for the larger research community. In this paper, we present not-so-BigGAN, a simple and scalable framework for training deep generative models on high-dimensional natural images. Instead of modelling the image in pixel space like in BigGAN, not-so-BigGAN uses wavelet transformations to bypass the curse of dimensionality, reducing the overall compute requirement significantly. Through extensive empirical evaluation, we demonstrate that for a fixed compute budget, not-so-BigGAN converges several times faster than BigGAN, reaching competitive image quality with an order of magnitude lower compute budget (4 Telsa-V100 GPUs).

BigGAN 是一种SOTA的高分辨率图像生成器。BigGAN之所以成功是因为它在训练的过程中使用了高维的大量mini-batch.因此它在训练过程中需要可观的算力支持,这是很多研究社区成员所不能拥有的条件。本文中作者提出了not-so-BigGAN,一种简单且可变尺寸的高分辨率自然图像生成模型。不同于BigGAN直接在像素空间上进行建模,not-so-BigGAN通过小波变换解决了维度灾难的问题,有效地减少了对于算力的要求。实验证明,本文模型的速度数倍于BigGAN,且达到相似的图像质量在4 Telsa-V100 GPUs的算力条件下。

LaDDer: Latent Data Distribution Modelling with a Generative Prior

LaDDer: Latent Data Distribution Modelling with a Generative Prior | Papers  With Code

In this paper, we show that the performance of a learnt generative model is closely related to the model’s ability to accurately represent the inferred latent data distribution, i.e. its topology and structural properties. We propose LaDDer to achieve accurate modelling of the latent data distribution in a variational autoencoder framework and to facilitate better representation learning. The central idea of LaDDer is a meta-embedding concept, which uses multiple VAE models to learn an embedding of the embeddings, forming a ladder of encodings. We use a non-parametric mixture as the hyper prior for the innermost VAE and learn all the parameters in a unified variational framework. From extensive experiments, we show that our LaDDer model is able to accurately estimate complex latent distribution and results in improvement in the representation quality. We also propose a novel latent space interpolation method that utilises the derived data distribution.

本文中,我们展示了学习的生成模型的性能与模型准确表示推断的潜在数据分布的能力密切相关(例如,数据的托补和结构特性)。我们提出了LaDDer在可变自动编码器框架中实现潜在数据分布的准确建模,并促使更好的表示学习。LaDDer的中心思想是元-嵌入概念,它利用多个VAE模型来学习嵌入的嵌入,从而形成编码的阶梯。我们使用非参数混合作为最内层VAE的超先验,并在统一的变分框架中学习所有参数。通过延伸实验,我们证明了LaDDer模型能够准确地估计复杂的潜在分布并提高表示质量,我们还提出了一种利用导出数据分布的新型潜在空间插值方法。

论文地址:https://arxiv.org/pdf/2009.00088.pdf

项目地址:https://github.com/lin-shuyu/ladder-latent-data-distribution-modelling

Training Generative Adversarial Networks with Limited Data

Stylegan – Towards Data Science

NVIDIA

论文地址:https://arxiv.org/pdf/2006.06676.pdf

庞大的数据集在背后推动着生成模型的发展,然而为特定应用收集足够大的图像集是存在挑战的,这要求数据要对主题类型,图像质量,地理位置,时间段,隐私,版权状态等施加限制,就比如CelebA数据集,在人脸位置、质量和图像的大小都存在着严格的要求,这个要求一旦施加在上十万张图像数据集下就是很庞大的工作量。

而GAN训练的样本量往往是在10^5∼10^6量级,这对于医学图像和小样本的数据训练是困难的,往往导致的是判别器过度拟合训练数据,此时判别器对生成器的反馈就会变得毫无意义,并且导致训练出现分歧。文章中做了在不同量级下数据集对生成质量的影响,结果如图所示。

封面图显示了FFHQ不同子集的基线结果,在每种情况下,训练都以相同的方式开始,但是随着训练的进行,FID开始上升。训练数据越少,越早发生。图b,c显示了训练过程中真实图像和生成图像的判别器输出分布。分布最初是重叠的,但随着判别器变得越来越有把握,它们会保持漂移,FID开始恶化的点与分布之间失去足够的重叠是一致的。由图c可以看到,当判别器过分拟合训练数据时,即使是真实图像的验证集也会判别和生成数据分布一致,这就是判别器过度拟合到了训练数据上的有力说明。

既然过拟合问题出现了,而且是由于数据集不足导致的,那能不能扩充数据集(旋转、加噪声)进行解决呢?

然而扩充数据集往往在训练分类器这样的判别语义信息任务是有效的,但是简单的扩充数据集在GAN中将会导致“泄漏”,这主要是由于数据集的扩充会导致GAN学习生成扩充的数据分布。本论文利用多样的数据扩充来防止判别器过度拟合的同时确保扩充不会”泄漏”到生成的图像中。总结一下ADA方法在生成模型上的优势:

  • ADA可以实现少样本数据下的较好质量的生成
  • ADA可以保证数据扩充前提下防治数据的”泄漏”
  • 自适应的判别器增强保证了模型不轻易出现过拟合,模型更加稳定