On Buggy Resizing Libraries and Surprising Subtleties in FID Calculation

We investigate the sensitivity of the Fréchet Inception Distance (FID) score to inconsistent and often incorrect implementations across different image processing libraries. FID score is widely used to evaluate generative models, but each FID implementation uses a different low-level image processing process. Image resizing functions in commonly-used deep learning libraries often introduce aliasing artifacts. We observe that numerous subtle choices need to be made for FID calculation and a lack of consistencies in these choices can lead to vastly different FID scores. In particular, we show that the following choices are significant: (1) selecting what image resizing library to use, (2) choosing what interpolation kernel to use, (3) what encoding to use when representing images. We additionally outline numerous common pitfalls that should be avoided and provide recommendations for computing the FID score accurately. We provide an easy-to-use optimized implementation of our proposed recommendations in the accompanying code.

https://arxiv.org/abs/2104.11222

我们发现FID的敏感度会因为在不同图像处理库下开发出现偏差。虽然FID是一个被广泛使用的标准用于评价生成模型,但是它在不同的库中使用不同的方式开发的。我们观察到图像缩放操作在深度学习应用中会引入混淆失真。这就说明我们需要为FID的计算提供多个选择以防止上述缩放操作引入的失真,(1)选择使用哪种库进行图像缩放;(2)选择使用哪种插值核进行缩放;(3)选择使用哪种编码方式保存图像。

发表评论

邮箱地址不会被公开。 必填项已用*标注