ManiGAN: Text-Guided Image Manipulation

ManiGAN: Text-Guided Image Manipulation | Papers With Code

The goal of our paper is to semantically edit parts of an image matching a given text that describes desired attributes (e.g., texture, colour, and background), while preserving other contents that are irrelevant to the text. To achieve this, we propose a novel generative adversarial network (ManiGAN), which contains two key components: text-image affine combination module (ACM) and detail correction module (DCM). The ACM selects image regions relevant to the given text and then correlates the regions with corresponding semantic words for effective manipulation. Meanwhile, it encodes original image features to help reconstruct text-irrelevant contents. The DCM rectifies mismatched attributes and completes missing contents of the synthetic image. Finally, we suggest a new metric for evaluating image manipulation results, in terms of both the generation of new attributes and the reconstruction of text-irrelevant contents. Extensive experiments on the CUB and COCO datasets demonstrate the superior performance of the proposed method. Code is available at

本文的目的是在保留其他与文字无关内容的前提下,使用文字去从语义层级编辑图片中特定的部分(例如纹理,颜色或者背景)。为了做到这点,我们提出了一个先进的GAN (ManiGAN),它主要由两个部分组成: 文字-图片仿射合成模块(ACM)和细节校正模块(DCM).ACM可以选择与文字对应的图片的部分,并且根据文字的信息编辑图片上对应的区域。同时,他会提取原始图片特征去帮助重建文字无关的内容。DCM可以校正为配对的标签以及完成对于合成图像缺漏部分的补全。最后,我们还提出了一个新的指标用于评估图像编辑的效果,这个指标反映了对于新标签的生成以及对于文字无关内容的重建。在CUB和COCO数据集上的实验证明了本文方法的先进性能。


邮箱地址不会被公开。 必填项已用*标注