Convolution-Free Medical Image Segmentation using Transformers

Like other applications in computer vision, medical image segmentation has been most successfully addressed using deep learning models that rely on the convolution operation as their main building block. Convolutions enjoy important properties such as sparse interactions, weight sharing, and translation equivariance. These properties give convolutional neural networks (CNNs) a strong and useful inductive bias for vision tasks. In this work we show that a different method, based entirely on self-attention between neighboring image patches and without any convolution operations, can achieve competitive or better results. Given a 3D image block, our network divides it into n3 3D patches, where n=3 or 5 and computes a 1D embedding for each patch. The network predicts the segmentation map for the center patch of the block based on the self-attention between these patch embeddings. We show that the proposed model can achieve segmentation accuracies that are better than the state of the art CNNs on three datasets. We also propose methods for pre-training this model on large corpora of unlabeled images. Our experiments show that with pre-training the advantage of our proposed network over CNNs can be significant when labeled training data is small.

https://arxiv.org/abs/2102.13645

与其他的机器视觉任务类似,深度学习模型依赖卷积操作模块在医疗图像分割领域也取得了许多的成功。卷积操作有许多优势,例如稀疏交互,共享权重和翻译同异性。这些优势使得卷积神经网络变得强势且在许多视觉应用中获得广泛应用。在本文中我们提出一种不一样的方法,这种方法完全基于图像邻域patch之间的自注意力机制而不需要卷积操作,而且我们提出的方法能够获得与卷积模型相似甚至是更好的性能。我们的模型接受一个3D图像,然后会将它拆分成n^3个patches,这里n=3或5.之后我们会计算每一个patch的1D嵌入。网络会预测中心patch的分割结果依靠周围邻域的patch的自注意力。我们发现我们提出的模型在分割任务上优于CNNs模型。我们的模型可以在大型无标签的图像语料库中进行预训练,并使用预训练的优势在少样本测试中领先CNNs模型。

发表评论

邮箱地址不会被公开。 必填项已用*标注