标签归档:Automated Data Augmentation

RandAugment: Practical automated data augmentation with a reduced search space

Recent work has shown that data augmentation has the potential to significantly improve the generalization of deep learning models. Recently, automated augmentation strategies have led to state-of-the-art results in image classification and object detection. While these strategies were optimized for improving validation accuracy, they also led to state-of-the-art results in semi-supervised learning and improved robustness to common corruptions of images. An obstacle to a large-scale adoption of these methods is a separate search phase which increases the training complexity and may substantially increase the computational cost. Additionally, due to the separate search phase, these approaches are unable to adjust the regularization strength based on model or dataset size. Automated augmentation policies are often found by training small models on small datasets and subsequently applied to train larger models. In this work, we remove both of these obstacles. RandAugment has a significantly reduced search space which allows it to be trained on the target task with no need for a separate proxy task. Furthermore, due to the parameterization, the regularization strength may be tailored to different model and dataset sizes. RandAugment can be used uniformly across different tasks and datasets and works out of the box, matching or surpassing all previous automated augmentation approaches on CIFAR-10/100, SVHN, and ImageNet. On the ImageNet dataset we achieve 85.0% accuracy, a 0.6% increase over the previous state-of-the-art and 1.0% increase over baseline augmentation. On object detection, RandAugment leads to 1.0-1.3% improvement over baseline augmentation, and is within 0.3% mAP of AutoAugment on COCO. Finally, due to its interpretable hyperparameter, RandAugment may be used to investigate the role of data augmentation with varying model and dataset size. Code is available online.

https://arxiv.org/abs/1909.13719

最近的研究表明数据扩增可以极大地提高深度学习模型的泛化能力。近来自动数据扩增策略为图像分类和目标检测任务带来了客观的提升。尽管这些策略是为了提高验证精度而优化的,它们在半监督学习和面对污染图像数据时也拥有较高的鲁棒性。大规模采用这些策略的障碍是分离的搜索组会提高训练复杂度和计算难度。另外,因为分离的搜索组,这些方法很难将正则化适配于每一个组。自动数据扩增方法往往在较小地数据上训练较小地模型然后将其应用于更大的模型训练中。在本文中,我们提出了RandAugment用于解决以上问题。我们的方法极大地减小了搜索空间,这使得它可以在目标任务上直接训练而不需要分离到子任务中去。因为参数化,正则化强度可以为不同的模型和数据集尺寸量身定做。我们的方法可以在不同的任务和数据集上达成统一,并且达到或超越已有的自动数据扩增方法。