LambdaNetworks: Modeling Long-Range Interactions Without Attention

We present lambda layers — an alternative framework to self-attention — for capturing long-range interactions between an input and structured contextual information (e.g. a pixel surrounded by other pixels). Lambda layers capture such interactions by transforming available contexts into linear functions, termed lambdas, and applying these linear functions to each input separately. Similar to linear attention, lambda layers bypass expensive attention maps, but in contrast, they model both content and position-based interactions which enables their application to large structured inputs such as images. The resulting neural network architectures, LambdaNetworks, significantly outperform their convolutional and attentional counterparts on ImageNet classification, COCO object detection and COCO instance segmentation, while being more computationally efficient. Additionally, we design LambdaResNets, a family of hybrid architectures across different scales, that considerably improves the speed-accuracy tradeoff of image classification models. LambdaResNets reach excellent accuracies on ImageNet while being 3.2 – 4.4x faster than the popular EfficientNets on modern machine learning accelerators. When training with an additional 130M pseudo-labeled images, LambdaResNets achieve up to a 9.5x speed-up over the corresponding EfficientNet checkpoints.

https://arxiv.org/abs/2102.08602

在本文中我们提出lambda层,这是一种可以替代自注意力机制的架构,这个架构可以捕捉长距离的输入与结构化上下文信息之间的相互作用(例如一个像素与周围像素)。Lambda层通过将可用的上下文信息进行线性变换,这种变换成为lambdas,我们将这样的线性变换作用于每一个独立输入上。与线性注意力相似,lambda层不需要计算复杂的注意力图,相反它仅仅对内容和基于位置的相互关系进行建模从而使得它可以适应于大规模的结构性输入例如图像。我们将lambda层构建成一个LambdaNetworks,这个网络在目标检测,图像识别和实例分割上都有出色的效果。

发表评论

邮箱地址不会被公开。 必填项已用*标注