TransGAN: Two Transformers Can Make One Strong GAN

The recent explosive interest on transformers has suggested their potential to become powerful “universal” models for computer vision tasks, such as classification, detection, and segmentation. However, how further transformers can go – are they ready to take some more notoriously difficult vision tasks, e.g., generative adversarial networks (GANs)?Driven by that curiosity, we conduct the first pilot study in building a GAN completely free of convolutions, using only pure transformer-based architectures. Our vanilla GAN architecture, dubbed TransGAN, consists of a memory-friendly transformer-based generator that progressively increases feature resolution while decreasing embedding dimension, and a patch-level discriminator that is also transformer-based. We then demonstrate TransGAN to notably benefit from data augmentations (more than standard GANs), a multi-task co-training strategy for the generator, and a locally initialized self-attention that emphasizes the neighborhood smoothness of natural images. Equipped with those findings, TransGAN can effectively scale up with bigger models and high-resolution image datasets. Specifically, our best architecture achieves highly competitive performance compared to current state-of-the-art GANs based on convolutional backbones. Specifically, TransGAN sets new state-of-the-art IS score of 10.10 and FID score of 25.32 on STL-10. It also reaches competitive 8.64 IS score and 11.89 FID score on Cifar-10, and 12.23 FID score on CelebA 64×64, respectively. We also conclude with a discussion of the current limitations and future potential of TransGAN.


最近关于transformer的爆发式的关注证明了它有在例如分类,检测或者分割等计算机视觉任务上成为通用模型的潜力。但是,transformer可以走多远呢?它能够解决例如GANs等一些困难的视觉任务了吗?好奇心驱使我们完成了第一个完全非卷积的GAN,这个GAN完全由transformer构成。我们的GAN架构被成为TransGAN. 它可以分为以下几个部分:内存友好的基于transformer的生成器,这个生成器通过渐进式地提升特征分辨率且降低特征的尺寸。一个patch级别的基于transformer的判别器。然后我们展示了TransGAN相对与其他的GANs能够更好地利用数据增广来提升性能。我们还提出了一个多任务的联合训练策略以更好地训练生成器,使得生成器可以用过局部自注意力机制感知图像的邻域平滑度。通过以上的发现,TransGAN得以适应更大且更高清的数据集。实验证明TransGAN拥有SOTA的性能。

GuidedStyle: Attribute Knowledge Guided Style Manipulation for Semantic Face Editing

Although significant progress has been made in synthesizing high-quality and visually realistic face images by unconditional Generative Adversarial Networks (GANs), there still lacks of control over the generation process in order to achieve semantic face editing. In addition, it remains very challenging to maintain other face information untouched while editing the target attributes. In this paper, we propose a novel learning framework, called GuidedStyle, to achieve semantic face editing on StyleGAN by guiding the image generation process with a knowledge network. Furthermore, we allow an attention mechanism in StyleGAN generator to adaptively select a single layer for style manipulation. As a result, our method is able to perform disentangled and controllable edits along various attributes, including smiling, eyeglasses, gender, mustache and hair color. Both qualitative and quantitative results demonstrate the superiority of our method over other competing methods for semantic face editing. Moreover, we show that our model can be also applied to different types of real and artistic face editing, demonstrating strong generalization ability. 


虽然在非条件GAN图像在高品质图像生成领域已经取得了长足进步,现在的生成过程还存在一定的缺陷,例如语义面部编辑任务。另外,在编辑面部图像的时候如何保留非编辑区域信息依然是一个挑战。在本文中,我们提出了针对语义面部编辑任务提出一种新的网络架构,GuidedStyle. 这种架构基于StyleGAN通过一个知识网络来引导图像生成过程。而且我们利用注意力机制使得StyleGAN的生成器可以自适应地选择某一层进行风格编辑。结果表明,我们提出的方法可以利用多样地标签在面部编辑任务中,包括微笑,眼睛,性别,胡子以及发色。

Navigating the GAN Parameter Space for Semantic Image Editing

An image

Generative Adversarial Networks (GANs) are currently an indispensable tool for visual editing, being a standard component of image-to-image translation and image restoration pipelines. Furthermore, GANs are especially useful for controllable generation since their latent spaces contain a wide range of interpretable directions, well suited for semantic editing operations. By gradually changing latent codes along these directions, one can produce impressive visual effects, unattainable without GANs. 
In this paper, we significantly expand the range of visual effects achievable with the state-of-the-art models, like StyleGAN2. In contrast to existing works, which mostly operate by latent codes, we discover interpretable directions in the space of the generator parameters. By several simple methods, we explore this space and demonstrate that it also contains a plethora of interpretable directions, which are an excellent source of non-trivial semantic manipulations. The discovered manipulations cannot be achieved by transforming the latent codes and can be used to edit both synthetic and real images. We release our code and models and hope they will serve as a handy tool for further efforts on GAN-based image editing.



Teaching a GAN What Not to Learn

Generative adversarial networks (GANs) were originally envisioned as unsupervised generative models that learn to follow a target distribution. Variants such as conditional GANs, auxiliary-classifier GANs (ACGANs) project GANs on to supervised and semi-supervised learning frameworks by providing labelled data and using multi-class discriminators. In this paper, we approach the supervised GAN problem from a different perspective, one that is motivated by the philosophy of the famous Persian poet Rumi who said, “The art of knowing is knowing what to ignore.” In the GAN framework, we not only provide the GAN positive data that it must learn to model, but also present it with so-called negative samples that it must learn to avoid – we call this “The Rumi Framework.” This formulation allows the discriminator to represent the underlying target distribution better by learning to penalize generated samples that are undesirable – we show that this capability accelerates the learning process of the generator. We present a reformulation of the standard GAN (SGAN) and least-squares GAN (LSGAN) within the Rumi setting. The advantage of the reformulation is demonstrated by means of experiments conducted on MNIST, Fashion MNIST, CelebA, and CIFAR-10 datasets. Finally, we consider an application of the proposed formulation to address the important problem of learning an under-represented class in an unbalanced dataset. The Rumi approach results in substantially lower FID scores than the standard GAN frameworks while possessing better generalization capability.



Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

PDF] Learning a Probabilistic Latent Space of Object Shapes via 3D  Generative-Adversarial Modeling | Semantic Scholar

We study the problem of 3D object generation. We propose a novel framework, namely 3D Generative Adversarial Network (3D-GAN), which generates 3D objects from a probabilistic space by leveraging recent advances in volumetric convolutional networks and generative adversarial nets. The benefits of our model are three-fold: first, the use of an adversarial criterion, instead of traditional heuristic criteria, enables the generator to capture object structure implicitly and to synthesize high-quality 3D objects; second, the generator establishes a mapping from a low-dimensional probabilistic space to the space of 3D objects, so that we can sample objects without a reference image or CAD models, and explore the 3D object manifold; third, the adversarial discriminator provides a powerful 3D shape descriptor which, learned without supervision, has wide applications in 3D object recognition. Experiments demonstrate that our method generates high-quality 3D objects, and our unsupervisedly learned features achieve impressive performance on 3D object recognition, comparable with those of supervised learning methods.


我们在本文中研究3D目标生成问题。我们提出了一种名为3D-GAN的三维生成对抗网络模型,它可以通过一个立体的网络模型从概率空间生成3D目标。我们提出的模型有三个优点:(1) 利用对抗评价标准而非传统的启发式评价标准,这使得网络可以隐式地捕捉物体的结构并且生成高质量的3D目标;(2) 生成器建立一个从低维概率空间到3D目标空间的映射,因此我们可以无需样例图像或者CAD模型就可以生成3D目标; (3) 对抗判别器可以作为一个有效的3D建模描述器,它可以无监督地训练并且被用于3D目标识别任务中。实验表明我们的方法可以有效地生成高质量的3D目标,我们的无监督特征在3D目标识别任务中相对于有监督的特征取得了良好的成绩。

Resolution Dependant GAN Interpolation for Controllable Image Synthesis Between Domains

NeurIPS 2020 Workshop | Indie GAN Interpolation Method Turns Selfies Into  Cartoon Characters | Synced

GANs can generate photo-realistic images from the domain of their training data. However, those wanting to use them for creative purposes often want to generate imagery from a truly novel domain, a task which GANs are inherently unable to do. It is also desirable to have a level of control so that there is a degree of artistic direction rather than purely curation of random results. Here we present a method for interpolating between generative models of the StyleGAN architecture in a resolution dependant manner. This allows us to generate images from an entirely novel domain and do this with a degree of control over the nature of the output.




Exposing GAN-generated Faces Using Inconsistent Corneal Specular Highlights

Sophisticated generative adversary network (GAN) models are now able to synthesize highly realistic human faces that are difficult to discern from real ones visually. GAN synthesized faces have become a new form of online disinformation. In this work, we show that GAN synthesized faces can be exposed with the inconsistent corneal specular highlights between two eyes. We show that such artifacts exist widely and further describe a method to extract and compare corneal specular highlights from two eyes. Qualitative and quantitative evaluations of our method suggest its simplicity and effectiveness in distinguishing GAN synthesized faces.



not-so-BigGAN: Generating High-Fidelity Images on a Small Compute Budget

BigGAN is the state-of-the-art in high-resolution image generation, successfully leveraging advancements in scalable computing and theoretical understanding of generative adversarial methods to set new records in conditional image generation. A major part of BigGAN’s success is due to its use of large mini-batch sizes during training in high dimensions. While effective, this technique requires an incredible amount of compute resources and/or time (256 TPU-v3 Cores), putting the model out of reach for the larger research community. In this paper, we present not-so-BigGAN, a simple and scalable framework for training deep generative models on high-dimensional natural images. Instead of modelling the image in pixel space like in BigGAN, not-so-BigGAN uses wavelet transformations to bypass the curse of dimensionality, reducing the overall compute requirement significantly. Through extensive empirical evaluation, we demonstrate that for a fixed compute budget, not-so-BigGAN converges several times faster than BigGAN, reaching competitive image quality with an order of magnitude lower compute budget (4 Telsa-V100 GPUs).

BigGAN 是一种SOTA的高分辨率图像生成器。BigGAN之所以成功是因为它在训练的过程中使用了高维的大量mini-batch.因此它在训练过程中需要可观的算力支持,这是很多研究社区成员所不能拥有的条件。本文中作者提出了not-so-BigGAN,一种简单且可变尺寸的高分辨率自然图像生成模型。不同于BigGAN直接在像素空间上进行建模,not-so-BigGAN通过小波变换解决了维度灾难的问题,有效地减少了对于算力的要求。实验证明,本文模型的速度数倍于BigGAN,且达到相似的图像质量在4 Telsa-V100 GPUs的算力条件下。

LaDDer: Latent Data Distribution Modelling with a Generative Prior

LaDDer: Latent Data Distribution Modelling with a Generative Prior | Papers  With Code

In this paper, we show that the performance of a learnt generative model is closely related to the model’s ability to accurately represent the inferred latent data distribution, i.e. its topology and structural properties. We propose LaDDer to achieve accurate modelling of the latent data distribution in a variational autoencoder framework and to facilitate better representation learning. The central idea of LaDDer is a meta-embedding concept, which uses multiple VAE models to learn an embedding of the embeddings, forming a ladder of encodings. We use a non-parametric mixture as the hyper prior for the innermost VAE and learn all the parameters in a unified variational framework. From extensive experiments, we show that our LaDDer model is able to accurately estimate complex latent distribution and results in improvement in the representation quality. We also propose a novel latent space interpolation method that utilises the derived data distribution.




Training Generative Adversarial Networks with Limited Data

Stylegan – Towards Data Science








  • ADA可以实现少样本数据下的较好质量的生成
  • ADA可以保证数据扩充前提下防治数据的”泄漏”
  • 自适应的判别器增强保证了模型不轻易出现过拟合,模型更加稳定