FcaNet: Frequency Channel Attention Networks

https://arxiv.org/abs/2012.11879

Attention mechanism, especially channel attention, has gained great success in the computer vision field. Many works focus on how to design efficient channel attention mechanisms while ignoring a fundamental problem, i.e., using global average pooling (GAP) as the unquestionable pre-processing method. In this work, we start from a different view and rethink channel attention using frequency analysis. Based on the frequency analysis, we mathematically prove that the conventional GAP is a special case of the feature decomposition in the frequency domain. With the proof, we naturally generalize the pre-processing of channel attention mechanism in the frequency domain and propose FcaNet with novel multi-spectral channel attention. The proposed method is simple but effective. We can change only one line of code in the calculation to implement our method within existing channel attention methods. Moreover, the proposed method achieves state-of-the-art results compared with other channel attention methods on image classification, object detection, and instance segmentation tasks. Our method could improve by 1.8% in terms of Top-1 accuracy on ImageNet compared with the baseline SENet-50, with the same number of parameters and the same computational cost. Our code and models will be made publicly available.

注意力机制,尤其是通道注意力,类似的方法已经在计算机视觉领域取得了巨大的成功。很多工作专注于设计高效的通道注意力机制而忽略了一个基本的问题:利用全局平均池化(GAP)作为一个标准的预处理方法。在本文中,我们从一个不一样的视角去利用频域的方法去考量通道注意力。根据频域分析,我们从数学上证明了常规的GAP是一种在频率域上特征分解的特殊情况。根据证明,我们自然可以将在频域上的通道注意力机制预处理流程拓展并且提出一种基于多频谱通道注意力机制的模型:FcaNet. 我们提出的方法简单但是高效。我们可以仅仅在现有的通道注意力模型中只修改一行代码实现我们提出的模型。另外,我们的方法取得了SOTA的性能在图像识别,目标检测,实例分割等领域。我们的方法将ImageNet的Top-1成绩提高了1.8%,使用的是SENet-50作为基线,参数量和算力不变。我们的代码和模型将会公开。

发表评论

邮箱地址不会被公开。 必填项已用*标注