标签归档:Frequency Domain

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network

Existing image-to-image translation (I2IT) methods are either constrained to low-resolution images or long inference time due to their heavy computational burden on the convolution of high-resolution feature maps. In this paper, we focus on speeding-up the high-resolution photorealistic I2IT tasks based on closed-form Laplacian pyramid decomposition and reconstruction. Specifically, we reveal that the attribute transformations, such as illumination and color manipulation, relate more to the low-frequency component, while the content details can be adaptively refined on high-frequency components. We consequently propose a Laplacian Pyramid Translation Network (LPTN) to simultaneously perform these two tasks, where we design a lightweight network for translating the low-frequency component with reduced resolution and a progressive masking strategy to efficiently refine the high-frequency ones. Our model avoids most of the heavy computation consumed by processing high-resolution feature maps and faithfully preserves the image details. Extensive experimental results on various tasks demonstrate that the proposed method can translate 4K images in real-time using one normal GPU while achieving comparable transformation performance against existing methods. 


现有的I2IT的方法被低分辨率图像和冗长的推理时间困扰。在本文中,我们通过闭合拉普拉斯金字塔进行分解和重建以完成高分辨图像的I2IT任务。我们发现光照和色彩变化更多的与图像的低频部分相关,而图像的内容与其高频部分相关。我们在这里提出一种拉普拉斯金字塔变换网络(LPTN), 这个轻量化的网络可以用低分辨率的形式转换低频特征并用一种渐进式的掩膜方式转换高频特征。我们的模型避免的大部分的复杂计算同时保持了尽量多的图像细节。在实验中,我们的模型可以实现实时4k分辨率的图像风格迁移。

SWAGAN: A Style-based Wavelet-driven Generative Model

In recent years, considerable progress has been made in the visual quality of Generative Adversarial Networks (GANs). Even so, these networks still suffer from degradation in quality for high-frequency content, stemming from a spectrally biased architecture, and similarly unfavorable loss functions. To address this issue, we present a novel general-purpose Style and WAvelet based GAN (SWAGAN) that implements progressive generation in the frequency domain. SWAGAN incorporates wavelets throughout its generator and discriminator architectures, enforcing a frequency-aware latent representation at every step of the way. This approach yields enhancements in the visual quality of the generated images, and considerably increases computational performance. We demonstrate the advantage of our method by integrating it into the SyleGAN2 framework, and verifying that content generation in the wavelet domain leads to higher quality images with more realistic high-frequency content. Furthermore, we verify that our model’s latent space retains the qualities that allow StyleGAN to serve as a basis for a multitude of editing tasks, and show that our frequency-aware approach also induces improved downstream visual quality.



FcaNet: Frequency Channel Attention Networks


Attention mechanism, especially channel attention, has gained great success in the computer vision field. Many works focus on how to design efficient channel attention mechanisms while ignoring a fundamental problem, i.e., using global average pooling (GAP) as the unquestionable pre-processing method. In this work, we start from a different view and rethink channel attention using frequency analysis. Based on the frequency analysis, we mathematically prove that the conventional GAP is a special case of the feature decomposition in the frequency domain. With the proof, we naturally generalize the pre-processing of channel attention mechanism in the frequency domain and propose FcaNet with novel multi-spectral channel attention. The proposed method is simple but effective. We can change only one line of code in the calculation to implement our method within existing channel attention methods. Moreover, the proposed method achieves state-of-the-art results compared with other channel attention methods on image classification, object detection, and instance segmentation tasks. Our method could improve by 1.8% in terms of Top-1 accuracy on ImageNet compared with the baseline SENet-50, with the same number of parameters and the same computational cost. Our code and models will be made publicly available.

注意力机制,尤其是通道注意力,类似的方法已经在计算机视觉领域取得了巨大的成功。很多工作专注于设计高效的通道注意力机制而忽略了一个基本的问题:利用全局平均池化(GAP)作为一个标准的预处理方法。在本文中,我们从一个不一样的视角去利用频域的方法去考量通道注意力。根据频域分析,我们从数学上证明了常规的GAP是一种在频率域上特征分解的特殊情况。根据证明,我们自然可以将在频域上的通道注意力机制预处理流程拓展并且提出一种基于多频谱通道注意力机制的模型:FcaNet. 我们提出的方法简单但是高效。我们可以仅仅在现有的通道注意力模型中只修改一行代码实现我们提出的模型。另外,我们的方法取得了SOTA的性能在图像识别,目标检测,实例分割等领域。我们的方法将ImageNet的Top-1成绩提高了1.8%,使用的是SENet-50作为基线,参数量和算力不变。我们的代码和模型将会公开。

Focal Frequency Loss for Generative Models

Despite the remarkable success of generative models in creating photorealistic images using deep neural networks, gaps could still exist between the real and generated images, especially in the frequency domain. In this study, we find that narrowing the frequency domain gap can ameliorate the image synthesis quality further. To this end, we propose the focal frequency loss, a novel objective function that brings optimization of generative models into the frequency domain. The proposed loss allows the model to dynamically focus on the frequency components that are hard to synthesize by down-weighting the easy frequencies. This objective function is complementary to existing spatial losses, offering great impedance against the loss of important frequency information due to the inherent crux of neural networks. We demonstrate the versatility and effectiveness of focal frequency loss to improve various baselines in both perceptual quality and quantitative performance.