TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

Medical image segmentation is an essential prerequisite for developing healthcare systems, especially for disease diagnosis and treatment planning. On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard and achieved tremendous success. However, due to the intrinsic locality of convolution operations, U-Net generally demonstrates limitations in explicitly modeling long-range dependency. Transformers, designed for sequence-to-sequence prediction, have emerged as alternative architectures with innate global self-attention mechanisms, but can result in limited localization abilities due to insufficient low-level details. In this paper, we propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation. On one hand, the Transformer encodes tokenized image patches from a convolution neural network (CNN) feature map as the input sequence for extracting global contexts. On the other hand, the decoder upsamples the encoded features which are then combined with the high-resolution CNN feature maps to enable precise localization. 
We argue that Transformers can serve as strong encoders for medical image segmentation tasks, with the combination of U-Net to enhance finer details by recovering localized spatial information. TransUNet achieves superior performances to various competing methods on different medical applications including multi-organ segmentation and cardiac segmentation.

https://arxiv.org/abs/2102.04306

医疗图像分割是医疗看护系统的一个基本任务,尤其是在疾病诊断和治疗计划领域。在多种医疗图像分割任务中,类U-Net的架构已经成为了一个基本的方法并且取得了巨大的成功。但是因为卷积操作固有的特点,UNet往往无法应对长距离的依赖。然而Transformers就是为了序列到序列预测任务设计的,它通过本身的自注意力机制实现架构融合但是它也受到定位的限制从而限制了其在低层级的表现。在本文中,我们提出TransUnet,一种融合了Transformer和Unet的有效的医疗图像分割方法。一方面,Transformer可以对已经序列化的卷积网络特征进行编码以提取全局上下文信息。另一方面,解码器对编码的特征进行上采样同时实现了多分辨率特征的融合,这使得网络在全局和局部的定位精确度得到了保证。TransUnet在多种器官和心脏病的分割任务上取得了良好的表现。

发表评论

邮箱地址不会被公开。 必填项已用*标注