标签归档:Data Augmentation

DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort

We introduce DatasetGAN: an automatic procedure to generate massive datasets of high-quality semantically segmented images requiring minimal human effort. Current deep networks are extremely data-hungry, benefiting from training on large-scale datasets, which are time consuming to annotate. Our method relies on the power of recent GANs to generate realistic images. We show how the GAN latent code can be decoded to produce a semantic segmentation of the image. Training the decoder only needs a few labeled examples to generalize to the rest of the latent space, resulting in an infinite annotated dataset generator! These generated datasets can then be used for training any computer vision architecture just as real datasets are. As only a few images need to be manually segmented, it becomes possible to annotate images in extreme detail and generate datasets with rich object and part segmentations. To showcase the power of our approach, we generated datasets for 7 image segmentation tasks which include pixel-level labels for 34 human face parts, and 32 car parts. Our approach outperforms all semi-supervised baselines significantly and is on par with fully supervised methods, which in some cases require as much as 100x more annotated data as our method.

https://arxiv.org/abs/2104.06490

在本文中,我们提出了一种可以为语义分割任务生成大量数据的DatasetGAN.GAN的隐编码可以被解码为分割图,而训练解码器只需要少量标注数据即可.

Improving Object Detection in Art Images Using Only Style Transfer

Despite recent advances in object detection using deep learning neural networks, these neural networks still struggle to identify objects in art images such as paintings and drawings. This challenge is known as the cross depiction problem and it stems in part from the tendency of neural networks to prioritize identification of an object’s texture over its shape. In this paper we propose and evaluate a process for training neural networks to localize objects – specifically people – in art images. We generate a large dataset for training and validation by modifying the images in the COCO dataset using AdaIn style transfer. This dataset is used to fine-tune a Faster R-CNN object detection network, which is then tested on the existing People-Art testing dataset. The result is a significant improvement on the state of the art and a new way forward for creating datasets to train neural networks to process art images.

https://arxiv.org/abs/2102.06529

虽然最近深度学习在目标检测领域有了长足发展,但是这些网络在艺术作品如画作等数据上的表现不佳。这个问题主要是因为神经网络倾向于通过目标的纹理而非形状进行推断。在本文中我们提出并且验证一种训练检测器的流程,这个流程训练的是对于艺术作品中的人物。我们使用AdaIn风格迁移将COCO数据集构建成一个庞大的数据集,然后在People-Art testing数据集上进行测试。结果显示我们的方法有效地提高了现有检测器在艺术作品上的检测表现。

RoI Tanh-polar Transformer Network for Face Parsing in the Wild

Face parsing aims to predict pixel-wise labels for facial components of a target face in an image. Existing approaches usually crop the target face from the input image with respect to a bounding box calculated during pre-processing, and thus can only parse inner facial Regions of Interest (RoIs). Peripheral regions like hair are ignored and nearby faces that are partially included in the bounding box can cause distractions. Moreover, these methods are only trained and evaluated on near-frontal portrait images and thus their performance for in-the-wild cases were unexplored. To address these issues, this paper makes three contributions. First, we introduce iBugMask dataset for face parsing in the wild containing 1,000 manually annotated images with large variations in sizes, poses, expressions and background, and Helen-LP, a large-pose training set containing 21,866 images generated using head pose augmentation. Second, we propose RoI Tanh-polar transform that warps the whole image to a Tanh-polar representation with a fixed ratio between the face area and the context, guided by the target bounding box. The new representation contains all information in the original image, and allows for rotation equivariance in the convolutional neural networks (CNNs). Third, we propose a hybrid residual representation learning block, coined HybridBlock, that contains convolutional layers in both the Tanh-polar space and the Tanh-Cartesian space, allowing for receptive fields of different shapes in CNNs. Through extensive experiments, we show that the proposed method significantly improves the state-of-the-art for face parsing in the wild.

https://arxiv.org/abs/2102.02717

面部分析任务致力于对面部进行像素级别的预测。现有的方法经常框选图像的各个部分进行预处理,所以只能处理面部内部的ROI。面部外围例如头发的区域被忽略,而边缘区域常常也会包括进矩形框中而带来干扰。另外,这些方法往往都只在对齐面部图像上进行训练和测试,因此它们的性能没有在真实场景的数据集上进行测试。为了解决上述问题,本文提出了IBugMask数据集,这个数据集包含1000张手工标注的拥有多尺寸多姿态多表情和多种背景的图像。我们还提出了Helen-LP,一个大型的姿态训练集,它包括21866张由真实面部图像扩增出来的图像。然后,我们还提出了RoI Tanh-polar 变换,这个变换可以以固定的面部和上下文的比例将图像转换到anh-polar表示,这个转换由目标矩形框引导。新的表示包含原始图像所有的信息,并且在CNNs中保证旋转不变性。最后我们提出了一种混合的残差学习模块,称为HybridBlock。它包含Tanh-polar空间和Tanh-Cartesian空间的卷积层,允许不同尺寸的感受野。通过实验,证明了我们的模型可以在真实的数据集上取得SOTA的评价。

Learning domain-agnostic visual representation for computational pathology using medically-irrelevant style transfer augmentation

Suboptimal generalization of machine learning models on unseen data is a key challenge which hampers the clinical applicability of such models to medical imaging. Although various methods such as domain adaptation and domain generalization have evolved to combat this challenge, learning robust and generalizable representations is core to medical image understanding, and continues to be a problem. Here, we propose STRAP (Style TRansfer Augmentation for histoPathology), a form of data augmentation based on random style transfer from artistic paintings, for learning domain-agnostic visual representations in computational pathology. Style transfer replaces the low-level texture content of images with the uninformative style of randomly selected artistic paintings, while preserving high-level semantic content. This improves robustness to domain shift and can be used as a simple yet powerful tool for learning domain-agnostic representations. We demonstrate that STRAP leads to state-of-the-art performance, particularly in the presence of domain shifts, on a particular classification task of predicting microsatellite status in colorectal cancer using digitized histopathology images.

https://arxiv.org/abs/2102.01678

机器学习模型在未接触的数据上进行次优生成是一个具有挑战性的任务,这个任务执行的指令决定了医疗数据的可用性。尽管许多方法例如域适应以及域生成都可以解决上述挑战,然后学习一个鲁棒的且泛化能力强表示一直是医疗图像任务的核心。我们在本文中提出STRAP,一种基于随机风格迁移的数据扩增方法,它可以学习域不相关的计算病理学视觉表示。风格迁移用于将低层次纹理替换为统一的风格迁移图像,这样的风格迁移图像可以广泛地应用于域无关地表示中,我们展示了STRAP的SOTA性能在解决域飘逸的问题上,以及在大肠癌病理学图像的分类问题上。

Training Generative Adversarial Networks with Limited Data

Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes. The approach does not require changes to loss functions or network architectures, and is applicable both when training from scratch and when fine-tuning an existing GAN on another dataset. We demonstrate, on several datasets, that good results are now possible using only a few thousand training images, often matching StyleGAN2 results with an order of magnitude fewer images. We expect this to open up new application domains for GANs. We also find that the widely used CIFAR-10 is, in fact, a limited data benchmark, and improve the record FID from 5.59 to 2.42.

https://arxiv.org/abs/2006.06676

使用极少量数据训练GAN会导致判别器过拟合从而训练无法收敛。我们提出一个自适应的判别器训练机制以有效地稳定在小规模数据集上的训练过程。我们的方法不需要修改损失函数或者模型架构,并且可以应用在从头训练或者fine-tuning的阶段。实验结果说明在数个数据集上我们的方法可以以几千张的训练图像达到原始StyleGAN2的性能。我们希望我们的研究可以打开一个新的GAN应用领域,并且我们还展示了在CIFAR10这样的小数据集上我们可以有效地将FID从5.59降至2.42.

AutoSimulate: (Quickly) Learning Synthetic Data Generation

PDF] AutoSimulate: (Quickly) Learning Synthetic Data Generation | Semantic  Scholar

Simulation is increasingly being used for generating large labelled datasets in many machine learning problems. Recent methods have focused on adjusting simulator parameters with the goal of maximising accuracy on a validation task, usually relying on REINFORCE-like gradient estimators. However these approaches are very expensive as they treat the entire data generation, model training, and validation pipeline as a black-box and require multiple costly objective evaluations at each iteration. We propose an efficient alternative for optimal synthetic data generation, based on a novel differentiable approximation of the objective. This allows us to optimize the simulator, which may be non-differentiable, requiring only one objective evaluation at each iteration with a little overhead. We demonstrate on a state-of-the-art photorealistic renderer that the proposed method finds the optimal data distribution faster (up to 50x), with significantly reduced training data generation (up to 30x) and better accuracy (+8.7%) on real-world test datasets than previous methods.

https://www.microsoft.com/en-us/research/uploads/prod/2020/08/autosimulate_eccv20.pdf

在很多机器学习应用中,我们经常使用虚拟的方式合成大量带有标签的数据。为了使用这些合成的数据,现有的方法关注调整虚拟器的参数以生成合成数据最大化在验证集上的性能,而这些方法往往依赖于使用类REINFORCE的梯度估计器。但是这类方法将整个数据合成,训练以及验证的过程当作是一个黑箱,并且在每个迭代中都要求费时的更新。因此在本文中作者提出了一种基于目标可微近似的优化数据合成方法。本文提出的方法可以用于优化一个非线性的虚拟器,而且仅仅需要在每次迭代的时候更新一次。我们在是实验中针对一个SOTA的逼真照片的渲染器,本文方法有效地提高了虚拟器收敛的速度,并且减少了训练模型所需要的数据量,在这个基础上还提高了模型的性能。

Data Augmentation Using Generative Adversarial Network

Effective training of neural networks requires much data. In the low-data regime, parameters are underdetermined, and learnt networks generalise poorly. Data Augmentation alleviates this by using existing data more effectively. However standard data augmentation produces only limited plausible alternative data. Given there is potential to generate a much broader set of augmentations, we design and train a generative model to do data augmentation. The model, based on image conditional Generative Adversarial Networks, takes data from a source domain and learns to take any data item and generalise it to generate other within-class data items. As this generative process does not depend on the classes themselves, it can be applied to novel unseen classes of data. We show that a Data Augmentation Generative Adversarial Network (DAGAN) augments standard vanilla classifiers well. We also show a DAGAN can enhance few-shot learning systems such as Matching Networks. We demonstrate these approaches on Omniglot, on EMNIST having learnt the DAGAN on Omniglot, and VGG-Face data. In our experiments we can see over 13% increase in accuracy in the low-data regime experiments in Omniglot (from 69% to 82%), EMNIST (73.9% to 76%) and VGG-Face (4.5% to 12%); in Matching Networks for Omniglot we observe an increase of 0.5% (from 96.9% to 97.4%) and an increase of 1.8% in EMNIST (from 59.5% to 61.3%).

https://arxiv.org/pdf/1711.04340.pdf

训练一个神经网络往往需要大量的数据,而传统的数据扩增方法无法提供足够高质量的数据。本文提出了一个基于cGAN的数据扩增方法,生成器模型通过输入一个作为参照物的真实数据,以及一个随机向量用于增加多样性,生成一个与输入真实数据同类别的合成数据。判别器通过输入真实数据和合成数据学习两种分布的差异,总体的目标是生成与真实数据分布相同的多样化合成数据。

AutoAugment: Learning Augmentation Strategies from Data

Data augmentation is an effective technique for improving the accuracy of modern image classifiers. However, current data augmentation implementations are manually designed. In this paper, we describe a simple procedure called AutoAugment to automatically search for improved data augmentation policies. In our implementation, we have designed a search space where a policy consists of many sub- policies, one of which is randomly chosen for each image in each mini-batch. A sub-policy consists of two operations, each operation being an image processing function such as translation, rotation, or shearing, and the probabilities and magnitudes with which the functions are applied. We use a search algorithm to find the best policy such that the neural network yields the highest validation accuracy on a target dataset. Our method achieves state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, and ImageNet (without additional data). On ImageNet, we attain a Top-1 accuracy of 83.5% which is 0.4% better than the previous record of 83.1%. On CIFAR-10, we achieve an error rate of 1.5%, which is 0.6% better than the previous state-of-the-art. Augmentation policies we find are transferable between datasets. The policy learned on ImageNet transfers well to achieve significant improvements on other datasets, such as Oxford Flowers, Caltech-101, Oxford-IIT Pets, FGVC Air- craft, and Stanford Cars.

https://arxiv.org/pdf/1805.09501.pdf

数据扩增作为一个有效的手段一直在机器学习领域受到重视,目前现有的数据扩增手段均为手动设计。本文提出一种自动化的数据扩增方法,核心概念是在一个决策空间中自动搜索子决策,这些子决策由两个简单的数据扩增方法构成。模型架构由一个控制器(RNN)构成,RNN通过获取子网络的验证精度调整自己的参数,从而选择更加优化的子策略。实验证明,本方法可以有效地提高传统图像识别模型在各个数据集上的表现。

Training Generative Adversarial Networks with Limited Data

Stylegan – Towards Data Science

NVIDIA

论文地址:https://arxiv.org/pdf/2006.06676.pdf

庞大的数据集在背后推动着生成模型的发展,然而为特定应用收集足够大的图像集是存在挑战的,这要求数据要对主题类型,图像质量,地理位置,时间段,隐私,版权状态等施加限制,就比如CelebA数据集,在人脸位置、质量和图像的大小都存在着严格的要求,这个要求一旦施加在上十万张图像数据集下就是很庞大的工作量。

而GAN训练的样本量往往是在10^5∼10^6量级,这对于医学图像和小样本的数据训练是困难的,往往导致的是判别器过度拟合训练数据,此时判别器对生成器的反馈就会变得毫无意义,并且导致训练出现分歧。文章中做了在不同量级下数据集对生成质量的影响,结果如图所示。

封面图显示了FFHQ不同子集的基线结果,在每种情况下,训练都以相同的方式开始,但是随着训练的进行,FID开始上升。训练数据越少,越早发生。图b,c显示了训练过程中真实图像和生成图像的判别器输出分布。分布最初是重叠的,但随着判别器变得越来越有把握,它们会保持漂移,FID开始恶化的点与分布之间失去足够的重叠是一致的。由图c可以看到,当判别器过分拟合训练数据时,即使是真实图像的验证集也会判别和生成数据分布一致,这就是判别器过度拟合到了训练数据上的有力说明。

既然过拟合问题出现了,而且是由于数据集不足导致的,那能不能扩充数据集(旋转、加噪声)进行解决呢?

然而扩充数据集往往在训练分类器这样的判别语义信息任务是有效的,但是简单的扩充数据集在GAN中将会导致“泄漏”,这主要是由于数据集的扩充会导致GAN学习生成扩充的数据分布。本论文利用多样的数据扩充来防止判别器过度拟合的同时确保扩充不会”泄漏”到生成的图像中。总结一下ADA方法在生成模型上的优势:

  • ADA可以实现少样本数据下的较好质量的生成
  • ADA可以保证数据扩充前提下防治数据的”泄漏”
  • 自适应的判别器增强保证了模型不轻易出现过拟合,模型更加稳定

Data augmentation using learned transformations for one-shot medical image segmentation

PDF] Data Augmentation Using Learned Transformations for One-Shot Medical  Image Segmentation | Semantic Scholar

Image segmentation is an important task in many medical applications. Methods based on convolutional neural networks attain state-of-the-art accuracy; however, they typically rely on supervised training with large labeled datasets. Labeling medical images requires significant expertise and time, and typical hand-tuned approaches for data augmentation fail to capture the complex variations in such images.
We present an automated data augmentation method for synthesizing labeled medical images. We demonstrate our method on the task of segmenting magnetic resonance imaging (MRI) brain scans. Our method requires only a single segmented scan, and leverages other unlabeled scans in a semi-supervised approach. We learn a model of transformations from the images, and use the model along with the labeled example to synthesize additional labeled examples. Each transformation is comprised of a spatial deformation field and an intensity change, enabling the synthesis of complex effects such as variations in anatomy and image acquisition procedures. We show that training a supervised segmenter with these new examples provides significant improvements over state-of-the-art methods for one-shot biomedical image segmentation.

https://arxiv.org/pdf/1902.09383.pdf

长久以来,医疗图像分割一直都在医疗图像分析领域扮演着重要的角色。但是训练一个医疗怕图像分割网络往往需要监督学习和大量标记数据的支持,数据的标注的工作费时费力而且一般的数据扩增方法往往缺少多样性。因此,本文为脑部MRI图像分割任务提出了一种数据扩增方法,只需要一张标注过的脑部MRI图像,模型就可以通过空间和表征变换获得一系列带标注的合成脑部MRI图像。实验证明,使用本模型合成数据训练的分割模型可以在one-shot分割任务中获得state-f-the-art性能。