标签归档:Image segmentation

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

Medical image segmentation is an essential prerequisite for developing healthcare systems, especially for disease diagnosis and treatment planning. On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard and achieved tremendous success. However, due to the intrinsic locality of convolution operations, U-Net generally demonstrates limitations in explicitly modeling long-range dependency. Transformers, designed for sequence-to-sequence prediction, have emerged as alternative architectures with innate global self-attention mechanisms, but can result in limited localization abilities due to insufficient low-level details. In this paper, we propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation. On one hand, the Transformer encodes tokenized image patches from a convolution neural network (CNN) feature map as the input sequence for extracting global contexts. On the other hand, the decoder upsamples the encoded features which are then combined with the high-resolution CNN feature maps to enable precise localization. 
We argue that Transformers can serve as strong encoders for medical image segmentation tasks, with the combination of U-Net to enhance finer details by recovering localized spatial information. TransUNet achieves superior performances to various competing methods on different medical applications including multi-organ segmentation and cardiac segmentation.

https://arxiv.org/abs/2102.04306

医疗图像分割是医疗看护系统的一个基本任务,尤其是在疾病诊断和治疗计划领域。在多种医疗图像分割任务中,类U-Net的架构已经成为了一个基本的方法并且取得了巨大的成功。但是因为卷积操作固有的特点,UNet往往无法应对长距离的依赖。然而Transformers就是为了序列到序列预测任务设计的,它通过本身的自注意力机制实现架构融合但是它也受到定位的限制从而限制了其在低层级的表现。在本文中,我们提出TransUnet,一种融合了Transformer和Unet的有效的医疗图像分割方法。一方面,Transformer可以对已经序列化的卷积网络特征进行编码以提取全局上下文信息。另一方面,解码器对编码的特征进行上采样同时实现了多分辨率特征的融合,这使得网络在全局和局部的定位精确度得到了保证。TransUnet在多种器官和心脏病的分割任务上取得了良好的表现。

Convolution-Free Medical Image Segmentation using Transformers

Like other applications in computer vision, medical image segmentation has been most successfully addressed using deep learning models that rely on the convolution operation as their main building block. Convolutions enjoy important properties such as sparse interactions, weight sharing, and translation equivariance. These properties give convolutional neural networks (CNNs) a strong and useful inductive bias for vision tasks. In this work we show that a different method, based entirely on self-attention between neighboring image patches and without any convolution operations, can achieve competitive or better results. Given a 3D image block, our network divides it into n3 3D patches, where n=3 or 5 and computes a 1D embedding for each patch. The network predicts the segmentation map for the center patch of the block based on the self-attention between these patch embeddings. We show that the proposed model can achieve segmentation accuracies that are better than the state of the art CNNs on three datasets. We also propose methods for pre-training this model on large corpora of unlabeled images. Our experiments show that with pre-training the advantage of our proposed network over CNNs can be significant when labeled training data is small.

https://arxiv.org/abs/2102.13645

与其他的机器视觉任务类似,深度学习模型依赖卷积操作模块在医疗图像分割领域也取得了许多的成功。卷积操作有许多优势,例如稀疏交互,共享权重和翻译同异性。这些优势使得卷积神经网络变得强势且在许多视觉应用中获得广泛应用。在本文中我们提出一种不一样的方法,这种方法完全基于图像邻域patch之间的自注意力机制而不需要卷积操作,而且我们提出的方法能够获得与卷积模型相似甚至是更好的性能。我们的模型接受一个3D图像,然后会将它拆分成n^3个patches,这里n=3或5.之后我们会计算每一个patch的1D嵌入。网络会预测中心patch的分割结果依靠周围邻域的patch的自注意力。我们发现我们提出的模型在分割任务上优于CNNs模型。我们的模型可以在大型无标签的图像语料库中进行预训练,并使用预训练的优势在少样本测试中领先CNNs模型。

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

Although using convolutional neural networks (CNNs) as backbones achieves great successes in computer vision, this work investigates a simple backbone network useful for many dense prediction tasks without convolutions. Unlike the recently-proposed Transformer model (e.g., ViT) that is specially designed for image classification, we propose Pyramid Vision Transformer~(PVT), which overcomes the difficulties of porting Transformer to various dense prediction tasks. PVT has several merits compared to prior arts. (1) Different from ViT that typically has low-resolution outputs and high computational and memory cost, PVT can be not only trained on dense partitions of the image to achieve high output resolution, which is important for dense predictions but also using a progressive shrinking pyramid to reduce computations of large feature maps. (2) PVT inherits the advantages from both CNN and Transformer, making it a unified backbone in various vision tasks without convolutions by simply replacing CNN backbones. (3) We validate PVT by conducting extensive experiments, showing that it boosts the performance of many downstream tasks, e.g., object detection, semantic, and instance segmentation. For example, with a comparable number of parameters, RetinaNet+PVT achieves 40.4 AP on the COCO dataset, surpassing RetinNet+ResNet50 (36.3 AP) by 4.1 absolute AP. We hope PVT could serve as an alternative and useful backbone for pixel-level predictions and facilitate future researches.

https://arxiv.org/abs/2102.12122

虽然以卷积神经网络(CNNs)作为主干的模型在计算机视觉领域获得了巨大的成功,我们这篇文章会提出一个非卷积的简单网络,它将能够运用在许多预测任务上。不像最近提出的Transformer模型(例如ViT)是为了分类任务设计的,我们提出金字塔视觉Transformer (PVT). 我们的模型能够解决Transformer应用在密集预测任务时的种种困难。相比现有模型,PVT拥有以下优点:(1)不像现有ViT模型使用低分辨率输入且要求较大的计算量,PVT不仅仅能够在密集的图像区块上达到高分辨率输出,而且还运用渐进收缩金字塔去降低对于大尺寸特征图的计算量;(2)PVT从CNNs和Transformer那里继承了优点,这使得在许多视觉任务上统一简单将CNN主干进行替换无卷积的主干架构成为可能。(3)我们在例如目标检测、语义和实例分割任务等下游任务上对PVT模型进行了验证,实验结果说明我们的模型是SOTA的。

RoI Tanh-polar Transformer Network for Face Parsing in the Wild

Face parsing aims to predict pixel-wise labels for facial components of a target face in an image. Existing approaches usually crop the target face from the input image with respect to a bounding box calculated during pre-processing, and thus can only parse inner facial Regions of Interest (RoIs). Peripheral regions like hair are ignored and nearby faces that are partially included in the bounding box can cause distractions. Moreover, these methods are only trained and evaluated on near-frontal portrait images and thus their performance for in-the-wild cases were unexplored. To address these issues, this paper makes three contributions. First, we introduce iBugMask dataset for face parsing in the wild containing 1,000 manually annotated images with large variations in sizes, poses, expressions and background, and Helen-LP, a large-pose training set containing 21,866 images generated using head pose augmentation. Second, we propose RoI Tanh-polar transform that warps the whole image to a Tanh-polar representation with a fixed ratio between the face area and the context, guided by the target bounding box. The new representation contains all information in the original image, and allows for rotation equivariance in the convolutional neural networks (CNNs). Third, we propose a hybrid residual representation learning block, coined HybridBlock, that contains convolutional layers in both the Tanh-polar space and the Tanh-Cartesian space, allowing for receptive fields of different shapes in CNNs. Through extensive experiments, we show that the proposed method significantly improves the state-of-the-art for face parsing in the wild.

https://arxiv.org/abs/2102.02717

面部分析任务致力于对面部进行像素级别的预测。现有的方法经常框选图像的各个部分进行预处理,所以只能处理面部内部的ROI。面部外围例如头发的区域被忽略,而边缘区域常常也会包括进矩形框中而带来干扰。另外,这些方法往往都只在对齐面部图像上进行训练和测试,因此它们的性能没有在真实场景的数据集上进行测试。为了解决上述问题,本文提出了IBugMask数据集,这个数据集包含1000张手工标注的拥有多尺寸多姿态多表情和多种背景的图像。我们还提出了Helen-LP,一个大型的姿态训练集,它包括21866张由真实面部图像扩增出来的图像。然后,我们还提出了RoI Tanh-polar 变换,这个变换可以以固定的面部和上下文的比例将图像转换到anh-polar表示,这个转换由目标矩形框引导。新的表示包含原始图像所有的信息,并且在CNNs中保证旋转不变性。最后我们提出了一种混合的残差学习模块,称为HybridBlock。它包含Tanh-polar空间和Tanh-Cartesian空间的卷积层,允许不同尺寸的感受野。通过实验,证明了我们的模型可以在真实的数据集上取得SOTA的评价。

U-Noise: Learnable Noise Masks for Interpretable Image Segmentation

Deep Neural Networks (DNNs) are widely used for decision making in a myriad of critical applications, ranging from medical to societal and even judicial. Given the importance of these decisions, it is crucial for us to be able to interpret these models. We introduce a new method for interpreting image segmentation models by learning regions of images in which noise can be applied without hindering downstream model performance. We apply this method to segmentation of the pancreas in CT scans, and qualitatively compare the quality of the method to existing explainability techniques, such as Grad-CAM and occlusion sensitivity. Additionally we show that, unlike other methods, our interpretability model can be quantitatively evaluated based on the downstream performance over obscured images.

https://arxiv.org/abs/2101.05791

利用深度神经网络做出判断被广泛应用在很多关键领域,从医疗到社会甚至是司法领域。所以做出判断的时候的可解释性就变得很重要。我们提出一种新的方法通过学习图像上未被噪声影响的区域来解释图像分割模型。我们将这个方法应用到对胰腺的分割任务中,并且将模型的性能与其他例如Grad-CAM和occlusion sensitivity等可解释模型进行对比。我们发现我们的模型的可解释性能可以被下游性能检测。

Vox2Vox: 3D-GAN for Brain Tumour Segmentation

Vox2Vox] Vox2Vox: 3D-GAN for Brain Tumour Segmentation · Issue #5 ·  e4exp/paper_manager · GitHub

Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histological sub-regions, i.e., peritumoral edema, necrotic core, enhancing and non-enhancing tumor core. Although both these brain tumour types can easily be detected using multi-modal MRI, and exact them doing image segmentation is a challenging task. Hence, using the data provided by the BraTS Challenge 2020, we propose a 3D volume-to-volume Generative Adversarial Network for segmentation of brain tumours. The model, called Vox2Vox, generates realistic segmentation outputs from multi-channel 3D MR images, detecting the whole, core and enhancing tumor with median values of 93.39%, 92.50%, and 87.16% as dice scores and 2.44mm, 2.23mm, and 1.73mm for Hausdorff distance 95 percentile for the training dataset, and 91.75%, 88.13%, and 85.87% and 3.0mm, 3.74mm, and 2.23mm for the validation dataset, after ensembling 10 Vox2Vox models obtained with a 10-fold cross-validation.

https://arxiv.org/pdf/2003.13653.pdf

神经胶质瘤是一种最常见的脑部恶性肿瘤,它的特点是多变的恶性程度,难以预测的预后,以及异构的子结构,例如,瘤边水肿,坏死核心,以及在发展的和非发展的肿瘤核心。虽然这些脑部肿瘤可以轻易的被多模态MRI发现,但是对它们进行精确分割却是一个挑战。因此,我们提出了一种3D体到体的生成对抗网络并且在BraTS Challenge 2020数据上进行了测试。模型的名称为Vox2Vox,它可以在不同条件下为3D MR图像生成逼真的分割结果,检测全体、核心或者正在发展的肿瘤核心。

COVID TV-UNet: Segmenting COVID-19 Chest CT Images Using Connectivity Imposed U-Net

The novel corona-virus disease (COVID-19) pandemic has caused a major outbreak in more than 200 countries around the world, leading to a severe impact on the health
and life of many people globally. As of mid-July 2020, more than 12 million people were infected, and more than 570,000 death were reported. Computed Tomography (CT) images can be used as an alternative to the time-consuming RT-PCR test, to detect COVID-19. In this work we propose a segmentation framework to detect chest regions in CT images, which are infected by COVID-19. We use an architecture similar to U-Net
model, and train it to detect ground glass regions, on pixel level.
As the infected regions tend to form a connected component (rather than randomly distributed pixels), we add a suitable regularization term to the loss function, to promote connectivity of the segmentation map for COVID-19 pixels. 2D-anisotropic to talvariation is used for this purpose, and therefore the proposed model is called “TV-UNet”. Through experimental results on a relatively large-scale CT segmentation dataset of around 900 images, we show that adding this new regularization term leads
to 2% gain on overall segmentation performance compared to the U-Net model. Our experimental analysis, ranging from visual evaluation of the predicted segmentation results to quantitative assessment of segmentation performance (precision, recall, Dice
score, and mIoU) demonstrated great ability to identify COVID19 associated regions of the lungs, achieving a mIoU rate of over 99%, and a Dice score of around 86%.

本文提出一个用于检测被COVID-19感染的胸部区域的分割架构。我们使用一个类似与U-Net模型的结构并且训练它去在像素级别上检测ground-glass区域。由于感染区域倾向于形成一个连通的部分(而不是随机分布的像素点),我们在损失函数中加入一个合适的正则化项,以提高COVID-19像素分割图的连通性。我们在一个相对大规模的900张图像的CT分割数据集山进行实验,我们展示了通过添加新的正则化项可以实现相对于U-Net模型的2%的提升。我们的实验分析涵盖了预测分割结果的视觉评估以及分割效果的定量评估,展示了极大的COVID-19相关区域的识别能力。

论文地址:https://arxiv.org/pdf/2007.12303.pdf

MiniSeg: An Extremely Minimum Network for Efficient COVID-19 Segmentation

The rapid spread of the new pandemic, coronavirus disease 2019 (COVID-19), has seriously threatened global health. The gold standard for COVID-19 diagnosis is the tried-and true polymerase chain reaction (PCR), but PCR is a laborious, time-consuming and complicated manual process that is in short supply. Deep learning based computer-aided screening, e.g., infection segmentation, is thus viewed as an alternative due to its great successes in medical imaging. However, the publicly available COVID-19 training data are limited, which would easily cause overfitting of traditional deep learning methods that are usually data-hungry with millions of parameters. On the other hand, fast training/testing and low computational cost are also important for quick deployment and development of computer-aided COVID-19 screening systems, but traditional deep learning methods, especially for image segmentation, are usually computationally intensive. To address the above problems, we propose MiniSeg, a lightweight deep learning model for efficient COVID-19 segmentation. Compared with traditional segmentation methods, MiniSeg has several significant strengths:
i) it only has 472K parameters and is thus not easy to overfit;
ii) it has high computational efficiency and is thus convenient for practical deployment; iii) it can be fast retrained by other users using their private COVID-19 data for further improving performance. In addition, we build a comprehensive COVID-19 segmentation benchmark for comparing MiniSeg with traditional methods. Code and models will be released to promote the research and practical deployment for computer-aided COVID19 screening.

由于公开的COVID-19数据集有限,可能导致需要大量数据的传统深度学习方法过拟合。另一方面,快速的训练/测试和低计算代价对于快速的部署和开发也很重要,但是传统的深度学习方法特别是图像分割,通常是计算密集的。

  • 为了解决以上问题,我们提出了MiniSeg,一种轻量级深度学习模型用于有效的COVID-19分割。性比于传统的分割方法,MiniSeg有几个显著的优势:
    • i)它仅仅有472K个参数并且不容易过拟合
    • ii)它高度计算有效并且因此非常方便与实际的部署
    • iii)其他用户可以用他们自己的私人COVID-19数据快速的重新训练这个方法,以进一步提升性能。此外,我们构建了一个可理解的COVID-19分割基准,用于比较MiniSeg和其他传统的方法。

论文地址:https://arxiv.org/pdf/2004.09750.pdf

Unsupervised Learning of Image Segmentation Based on Differentiable Feature Clustering

PDF] Unsupervised Learning of Image Segmentation Based on Differentiable  Feature Clustering | Semantic Scholar
Illustration of the proposed algorithm for training a CNN

The usage of convolutional neural networks (CNNs) for unsupervised image segmentation was investigated in this study. Similar to supervised image segmentation, the proposed CNN assigns labels to pixels that denote the cluster to which the pixel belongs. In unsupervised image segmentation, however, no training images or ground truth labels of pixels are specified beforehand. Therefore, once a target image is input, the pixel labels and feature representations are jointly optimized, and their parameters are updated by the gradient descent. In the proposed approach, label prediction and network parameter learning are alternately iterated to meet the following criteria:
(a) pixels of similar features should be assigned the same label, (b) spatially continuous pixels should be assigned the same label, and (c) the number of unique labels should be large. Although these criteria are incompatible, the proposed approach minimizes
the combination of similarity loss and spatial continuity loss to find a plausible solution of label assignment that balances the aforementioned criteria well. The contributions of this study are four-fold. First, we propose a novel end-to-end network of unsupervised image segmentation that consists of normalization and an argmax function for differentiable clustering. Second, we introduce a spatial continuity loss function that mitigates the limitations of fixed segment boundaries possessed by previous work. Third, we present an extension of the proposed method for segmentation with scribbles as user input, which showed better accuracy than existing methods while maintaining efficiency. Finally, we introduce another extension of the proposed method: unseen image segmentation by using networks pre-trained with
a few reference images without re-training the networks. The effectiveness of the proposed approach was examined on several benchmark datasets of image segmentation.

Illustration of the proposed algorithm for training a CNN

研究了卷积神经网络在无监督图像分割中的应用。与有监督的图像分割类似,CNN为表示像素所属簇的像素分配标签。然而,在无监督图像分割中,没有预先指定训练图像或像素的基本真实性标签。因此,一旦输入目标图像,像素标签和特征表示将被联合优化,并通过梯度下降更新其参数。在该方法中,标签预测和网络参数学习交替迭代以满足以下条件:
(a)相似特征的像素应该被分配到相同的标签,(b)空间上连续的像素应该被分配相同的标签,(c)唯一标签的数量应该很大。虽然这些标准是不相容的,但所提出的方法将最小化
结合相似性损失和空间连续性损失,找到一个合理的标签分配解决方案,很好地平衡了上述标准。

这项研究的贡献有四个方面。首先,我们提出了一种新的无监督图像分割的端到端网络,它由标准化和用于可微聚类的argmax函数组成。其次,我们引入了一个空间连续性损失函数,该函数缓解了以往工作中固定线段边界的局限性。第三,我们提出了一种扩展的分割方法,在保持效率的前提下,比现有的分割方法有更好的精确度。最后,我们介绍了该方法的另一个扩展:利用预先训练好的网络进行不可见图像分割一些参考图像没有重新训练网络。在多个图像分割基准数据集上验证了该方法的有效性。

论文地址:https://arxiv.org/pdf/2007.09990.pdf

项目地址:https://github.com/kanezaki/pytorch-unsupervised-segmentation-tip/

论文解析参考:https://blog.csdn.net/qq_33034981/article/details/108203092#_97

ON SEGMENTATION OF PECTORAL MUSCLE IN DIGITAL MAMMOGRAMS BY MEANS OF DEEP LEARNING(利用深度学习对数字化乳腺X线照片中的胸大肌进行分割)

Upper row of subplots: (A1) input MLO mammogram, (A2) edge probability map OUT 1, (A3) edge probability map
OUT 2, (A4) binary mask B, (A5) modified binary mask, (A6) final edge probability map, and (A7) the result of graph-based
edge detection. Subplots B1-B7 are composed in an analogous manner, corresponding to a different input image shown in
Subplot B1.

Computer-aided diagnosis (CAD) has long become an integral part of radiological management of breast disease, facilitating a number of important clinical applications, including quantitative assessment of breast density and early detection of malignancies based on X-ray mammography. Common to such applications is the need to automatically discriminate between breast tissue and adjacent anatomy, with the latter being predominantly represented by pectoralis major (or pectoral muscle). Especially in the case of mammograms acquired in the mediolateral oblique (MLO) view, the muscle is easily confusable with some elements of breast anatomy due to their morphological and photometric similarity. As a result, the problem of automatic detection and segmentation of pectoral muscle in MLO mammograms remains a challenging task, innovative approaches to which are still required and constantly searched for. To address this problem, the present paper introduces a two-step segmentation strategy based on a combined use of data-driven prediction (deep learning) and graph-based image processing. In particular, the proposed method employs a convolutional neural network (CNN) which is designed to predict the location of breastpectoral boundary at different levels of spatial resolution. Subsequently, the predictions are used by the second stage of the algorithm, in which the desired boundary is recovered as a solution to the shortest path problem on a specially designed graph. The proposed algorithm has been tested on three different datasets (i.e., MIAS, CBIS-DDSm and InBreast) using a range of quantitative metrics. The results of comparative analysis show considerable improvement over state-of-the-art, while offering the possibility of model-free and fully automatic processing.

长期以来,计算机辅助诊断(CAD)已成为乳腺疾病放射治疗中不可或缺的一部分,CAD促进了很多重要的临床应用,包括对乳腺密度的定量评估和基于胸部X射线的恶性肿瘤的早期检测。这种应用的共同点是需要自动辨别乳房组织和相邻的解剖结构,后者主要由胸大肌(或胸肌)代表。特别是在从中外侧斜(MLO)角度获取的X光照片中,由于其形态和光度学相似性,肌肉很容易与乳房解剖的某些元素混淆。因此,在MLO乳房X线照片中自动检测和分割胸肌的问题仍然是一项艰巨的任务,仍然需要并不断寻求创新方法。为了解决这个问题,本论文介绍了一种基于数据驱动的预测(深度学习)和基于图的图像处理相结合的两步分割策略。特别地,所提出的方法采用了卷积神经网络(CNN),该卷积神经网络被设计为预测不同空间分辨率水平下的胸胸边界位置。随后,该预测被算法的第二阶段使用,在该预测中,将期望的边界作为特殊设计图上最短路径问题的解决方案进行恢复。所提出的算法已使用一系列定量指标在三个不同的数据集(即MIAS,CBIS-DDSm和InBreast)上进行了测试。

论文地址:https://arxiv.org/pdf/2008.12904.pdf