AutoSimulate: (Quickly) Learning Synthetic Data Generation

PDF] AutoSimulate: (Quickly) Learning Synthetic Data Generation | Semantic  Scholar

Simulation is increasingly being used for generating large labelled datasets in many machine learning problems. Recent methods have focused on adjusting simulator parameters with the goal of maximising accuracy on a validation task, usually relying on REINFORCE-like gradient estimators. However these approaches are very expensive as they treat the entire data generation, model training, and validation pipeline as a black-box and require multiple costly objective evaluations at each iteration. We propose an efficient alternative for optimal synthetic data generation, based on a novel differentiable approximation of the objective. This allows us to optimize the simulator, which may be non-differentiable, requiring only one objective evaluation at each iteration with a little overhead. We demonstrate on a state-of-the-art photorealistic renderer that the proposed method finds the optimal data distribution faster (up to 50x), with significantly reduced training data generation (up to 30x) and better accuracy (+8.7%) on real-world test datasets than previous methods.

https://www.microsoft.com/en-us/research/uploads/prod/2020/08/autosimulate_eccv20.pdf

在很多机器学习应用中,我们经常使用虚拟的方式合成大量带有标签的数据。为了使用这些合成的数据,现有的方法关注调整虚拟器的参数以生成合成数据最大化在验证集上的性能,而这些方法往往依赖于使用类REINFORCE的梯度估计器。但是这类方法将整个数据合成,训练以及验证的过程当作是一个黑箱,并且在每个迭代中都要求费时的更新。因此在本文中作者提出了一种基于目标可微近似的优化数据合成方法。本文提出的方法可以用于优化一个非线性的虚拟器,而且仅仅需要在每次迭代的时候更新一次。我们在是实验中针对一个SOTA的逼真照片的渲染器,本文方法有效地提高了虚拟器收敛的速度,并且减少了训练模型所需要的数据量,在这个基础上还提高了模型的性能。

发表评论

邮箱地址不会被公开。 必填项已用*标注