标签归档:Frequency Domain

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network

Existing image-to-image translation (I2IT) methods are either constrained to low-resolution images or long inference time due to their heavy computational burden on the convolution of high-resolution feature maps. In this paper, we focus on speeding-up the high-resolution photorealistic I2IT tasks based on closed-form Laplacian pyramid decomposition and reconstruction. Specifically, we reveal that the attribute transformations, such as illumination and color manipulation, relate more to the low-frequency component, while the content details can be adaptively refined on high-frequency components. We consequently propose a Laplacian Pyramid Translation Network (LPTN) to simultaneously perform these two tasks, where we design a lightweight network for translating the low-frequency component with reduced resolution and a progressive masking strategy to efficiently refine the high-frequency ones. Our model avoids most of the heavy computation consumed by processing high-resolution feature maps and faithfully preserves the image details. Extensive experimental results on various tasks demonstrate that the proposed method can translate 4K images in real-time using one normal GPU while achieving comparable transformation performance against existing methods. 

https://arxiv.org/abs/2105.09188

现有的I2IT的方法被低分辨率图像和冗长的推理时间困扰。在本文中,我们通过闭合拉普拉斯金字塔进行分解和重建以完成高分辨图像的I2IT任务。我们发现光照和色彩变化更多的与图像的低频部分相关,而图像的内容与其高频部分相关。我们在这里提出一种拉普拉斯金字塔变换网络(LPTN), 这个轻量化的网络可以用低分辨率的形式转换低频特征并用一种渐进式的掩膜方式转换高频特征。我们的模型避免的大部分的复杂计算同时保持了尽量多的图像细节。在实验中,我们的模型可以实现实时4k分辨率的图像风格迁移。

SWAGAN: A Style-based Wavelet-driven Generative Model

In recent years, considerable progress has been made in the visual quality of Generative Adversarial Networks (GANs). Even so, these networks still suffer from degradation in quality for high-frequency content, stemming from a spectrally biased architecture, and similarly unfavorable loss functions. To address this issue, we present a novel general-purpose Style and WAvelet based GAN (SWAGAN) that implements progressive generation in the frequency domain. SWAGAN incorporates wavelets throughout its generator and discriminator architectures, enforcing a frequency-aware latent representation at every step of the way. This approach yields enhancements in the visual quality of the generated images, and considerably increases computational performance. We demonstrate the advantage of our method by integrating it into the SyleGAN2 framework, and verifying that content generation in the wavelet domain leads to higher quality images with more realistic high-frequency content. Furthermore, we verify that our model’s latent space retains the qualities that allow StyleGAN to serve as a basis for a multitude of editing tasks, and show that our frequency-aware approach also induces improved downstream visual quality.

https://arxiv.org/abs/2102.06108

最近,通过GANs生成的图像质量有了显著提高。然而,对于一些高频内容,GANs生成效果还有待提高,这些问题是由网络对于特定频谱的偏差以及不合适的损失函数造成的。为了解决上述问题,我们提出了一种通用的基于风格和小波的GAN:SWAGAN用于频域的生成任务。SWAGAN在它的生成器和判别器中输入小波,强迫网络学习到频率相关的隐空间表示。这种架构可以提高生成图像的质量,并且节约算力。我们展示了把我们的方法域StyleGAN2结合在一起的优点,并且验证了在小波域上的内容生成可以生成更高质量的图片且保留真实的高频特征。另外,我们还验证了我们模型的隐空间保留了足够的特征可供StyleGAN进行后续的图像编辑任务,这充分证明了我们的频率感知的方法可以有效提升下游任务。

FcaNet: Frequency Channel Attention Networks

https://arxiv.org/abs/2012.11879

Attention mechanism, especially channel attention, has gained great success in the computer vision field. Many works focus on how to design efficient channel attention mechanisms while ignoring a fundamental problem, i.e., using global average pooling (GAP) as the unquestionable pre-processing method. In this work, we start from a different view and rethink channel attention using frequency analysis. Based on the frequency analysis, we mathematically prove that the conventional GAP is a special case of the feature decomposition in the frequency domain. With the proof, we naturally generalize the pre-processing of channel attention mechanism in the frequency domain and propose FcaNet with novel multi-spectral channel attention. The proposed method is simple but effective. We can change only one line of code in the calculation to implement our method within existing channel attention methods. Moreover, the proposed method achieves state-of-the-art results compared with other channel attention methods on image classification, object detection, and instance segmentation tasks. Our method could improve by 1.8% in terms of Top-1 accuracy on ImageNet compared with the baseline SENet-50, with the same number of parameters and the same computational cost. Our code and models will be made publicly available.

注意力机制,尤其是通道注意力,类似的方法已经在计算机视觉领域取得了巨大的成功。很多工作专注于设计高效的通道注意力机制而忽略了一个基本的问题:利用全局平均池化(GAP)作为一个标准的预处理方法。在本文中,我们从一个不一样的视角去利用频域的方法去考量通道注意力。根据频域分析,我们从数学上证明了常规的GAP是一种在频率域上特征分解的特殊情况。根据证明,我们自然可以将在频域上的通道注意力机制预处理流程拓展并且提出一种基于多频谱通道注意力机制的模型:FcaNet. 我们提出的方法简单但是高效。我们可以仅仅在现有的通道注意力模型中只修改一行代码实现我们提出的模型。另外,我们的方法取得了SOTA的性能在图像识别,目标检测,实例分割等领域。我们的方法将ImageNet的Top-1成绩提高了1.8%,使用的是SENet-50作为基线,参数量和算力不变。我们的代码和模型将会公开。

Focal Frequency Loss for Generative Models

Despite the remarkable success of generative models in creating photorealistic images using deep neural networks, gaps could still exist between the real and generated images, especially in the frequency domain. In this study, we find that narrowing the frequency domain gap can ameliorate the image synthesis quality further. To this end, we propose the focal frequency loss, a novel objective function that brings optimization of generative models into the frequency domain. The proposed loss allows the model to dynamically focus on the frequency components that are hard to synthesize by down-weighting the easy frequencies. This objective function is complementary to existing spatial losses, offering great impedance against the loss of important frequency information due to the inherent crux of neural networks. We demonstrate the versatility and effectiveness of focal frequency loss to improve various baselines in both perceptual quality and quantitative performance.

https://arxiv.org/pdf/2012.12821.pdf

虽然深度学习生成模型已经在图像生成任务中取得了令人瞩目的进展,但是在真实和生成图像之间还存在一定的区别,特别是在频域。在本文中,我们专注于减小上述频域的差别并且提高生成图像的质量。最后,我们提出了焦点频率损失,一种在频率域进行优化的目标函数。我们提出的顺势函数使得模型可以动态地关注不同地频率域,并且将难以生成的部分降维到容易生成的频率域。上述目标函数作为现有空间损失函数的补充,它解决了现有神经网络存在的损失重要频率信息的问题。我们通过在多个基线上比较直觉质量和量化性能展示了焦点频率损失函数的通用性和有效性。