标签归档:semantic segmentation

Exploring Cross-Image Pixel Contrast for Semantic Segmentation

代码地址:https://github.com/tfzhou/ContrastiveSeg

Main idea. Current segmentation models learn to map pixels (b) to an embedding space (c), yet ignoring intrinsic structures of labeled data (i.e., inter-image relations among pixels from a same class, noted with same color in(b)). Pixel-wise contrastive learning is introduced to foster a new training paradigm (d), by explicitly addressing intra-class compactness and inter-class dispersion. Each pixel (embedding) i is pulled closer to pixels of the same class, but pushed far from pixels from other classes. Thus a better-structured embedding space (e) is de- rived, eventually boosting the performance of segmentation models.

Current semantic segmentation methods focus only on mining “local” context, i.e., dependencies between pixels within individual images, by context-aggregation modules (e.g., dilated convolution, neural attention) or structure- aware optimization criteria (e.g., IoU-like loss). However, they ignore “global” context of the training data, i.e., rich semantic relations between pixels across different images. Inspired by the recent advance in unsupervised contrastive representation learning, we propose a pixel-wise contrastive algorithm for semantic segmentation in the fully supervised setting. The core idea is to enforce pixel embeddings belonging to a same semantic class to be more similar than embeddings from different classes. It raises a pixel-wise metric learning paradigm for semantic segmentation, by explicitly exploring the structures of labeled pixels, which were rarely explored before. Our method can be effortlessly incorporated into existing segmentation frame- works without extra overhead during testing. We experimentally show that, with famous segmentation models (i.e., DeepLabV3,HRNet,OCR) and backbones(i.e., ResNet, HR- Net), our method brings consistent performance improvements across diverse datasets (i.e., Cityscapes, PASCAL- Context, COCO-Stuff). We expect this work will encourage our community to rethink the current defacto training paradigm in fully supervised semantic segmentation1.

当前的语义分割模型关注挖掘局部上下文,例如:单个图像中像素之间的依赖,或者结构-感知的优化策略(IoU-like loss)。然而,他们忽略了训练数据中的全局上下文,例如,不同图像中限速之间的语义关系。

本文提出了一种全监督条件下的像素级别的对比算法用于语义分割。核心思想是强迫属于统一个语义类别的像素embeddings更为相似。它提出了一个像素级别的度量学习范式,通过显式地探索标记像素的结构来实现的。

提出的发那个发可以不费力气地融入到现有的分割框架,但在测试阶段没有额外的开销。

Trans2Seg: Transparent Object Segmentation with Transformer

This work presents a new fine-grained transparent object segmentation dataset, termed Trans10K-v2, extending Trans10K-v1, the first large-scale transparent object segmentation dataset. Unlike Trans10K-v1 that only has two limited categories, our new dataset has several appealing benefits. (1) It has 11 fine-grained categories of transparent objects, commonly occurring in the human domestic environment, making it more practical for real-world application. (2) Trans10K-v2 brings more challenges for the current advanced segmentation methods than its former version. Furthermore, a novel transformer-based segmentation pipeline termed Trans2Seg is proposed. Firstly, the transformer encoder of Trans2Seg provides the global receptive field in contrast to CNN’s local receptive field, which shows excellent advantages over pure CNN architectures. Secondly, by formulating semantic segmentation as a problem of dictionary look-up, we design a set of learnable prototypes as the query of Trans2Seg’s transformer decoder, where each prototype learns the statistics of one category in the whole dataset. We benchmark more than 20 recent semantic segmentation methods, demonstrating that Trans2Seg significantly outperforms all the CNN-based methods, showing the proposed algorithm’s potential ability to solve transparent object segmentation.

https://arxiv.org/abs/2101.08461

本文提出了一个由Trans10K-v1精细化改进的透明目标分割数据集Trans10K-v2,这也是第一个大规模的透明目标分割数据集。与Trans10K-v1只有两个类不同的是,Trans10K-v2有以下几个优点: (1) 它有11个精细的透明目标分类,这些类别的目标都是人类社会常见的;(2) Trans10K-v2 为先进的分割方法设置了许多挑战。 另外我们还提出了一个基于Transformer的分割模型叫做Trans2Seg. 首先,Transformer的编码器提供全局的感受野而非CNNs的局部感受野。再者,我们把分割的流程看作是一个查字典的过程,我们设计了一系列可学习的原型作为Trans2Seg的解码器,每个原型可以学习到每个类的统计信息。我们的方法优于最近20个SOTA分割方法,这说明我们的方法有解决透明目标分割任务的能力。

Semantic Segmentation of Pathological Lung Tissue With Dilated Fully Convolutional Networks

PDF] Semantic Segmentation of Pathological Lung Tissue With Dilated Fully  Convolutional Networks | Semantic Scholar

Early and accurate diagnosis of interstitial lung diseases (ILDs) is crucial for making treatment decisions, but can be challenging even for experienced radiologists. The diagnostic procedure is based on the detection and recognition of the different ILD pathologies in thoracic CT scans, yet their manifestation often appears similar. In this study, we propose the use of a deep purely convolutional neural network for the semantic segmentation of ILD patterns, as the basic component of a computer aided diagnosis system for ILDs. The proposed CNN, which consists of convolutional layers with dilated filters, takes as input a lung CT image of arbitrary size and outputs the corresponding label map. We trained and tested the network on a data set of 172 sparsely annotated CT scans, within a cross-validation scheme. The training was performed in an end-to-end and semisupervised fashion, utilizing both labeled and nonlabeled image regions. The experimental results show significant performance improvement with respect to the state of the art.

间质性肺疾病(ILDs)的早期准确诊断对于做出治疗决策至关重要,但即使对有经验的放射科医生来说也是一项挑战。诊断程序基于在胸部CT扫描中对不同ILD病理的检测和识别,然而它们的表现通常看起来是相似的。在这项研究中,我们建议使用深度纯卷积神经网络进行ILD模式的语义分割,作为计算机辅助诊断系统的基本组成部分。所提出的由具有扩张滤波器的卷积层组成的CNN将任意大小的肺部CT图像作为输入,并输出相应的标签图。在交叉验证方案中,我们在172个稀疏注释的CT扫描数据集上训练和测试了该网络。训练以端对端和半监督的方式进行,利用标记和未标记的图像区域。实验结果表明,相对于现有技术水平,性能有显著提高。

项目地址:https://github.com/intact-project/LungNet

论文地址:https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8325482

Context-aware Feature Generation for Zero-shot Semantic Segmentation

Context-aware Feature Generation for Zero-shot Semantic Segmentation |  Papers With Code

论文地址:https://arxiv.org/pdf/2008.06893.pdf

代码地址:https://github.com/bcmi/CaGNet-Zero-Shot-Semantic-Segmentation

Existing semantic segmentation models heavily rely on dense pixelwise annotations. To reduce the annotation pressure, we focus on a challenging task named zero-shot semantic segmentation, which aims to segment unseen objects with zero annotations. This task can be accomplished by transferring knowledge across categories via semantic word embeddings. In this paper, we propose a novel context-aware feature generation method for zero-shot segmentation named CaGNet. In particular, with the observation that a pixel-wise feature highly depends on its contextual information, we
insert a contextual module in a segmentation network to capture the pixel-wise contextual information, which guides the process of generating more diverse and context-aware features from semantic word embeddings. Our method achieves state-of-the-art results on three benchmark datasets for zero-shot segmentation.

现有的语义分割方法严重依赖大量的像素级的标注。为了减少标注压力,我们关注名为zero-shot语义分割的挑战任务,该任务旨在在没有标注的情况下分割从未见过的目标。此人物可以通过迁移不同类别之间的知识实现,这些知识来源于语义词embeddings。本文提出一种新颖的用于zero-shot分割的上下文感知特征生成方法,名为CaGNet。特别地,观察到一个像素集别的特征高度依赖于它的上下文信息,我们插入一个上下文模块在分割网络中用于获取像素集别的上下文信息,该模块可用于引导生成来自于语义词汇embeddings的更多样并且上下文感知特征的过程。