标签归档:Image Regcognition

An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale  | Papers With Code

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer can perform very well on image classification tasks when applied directly to sequences of image patches. When pre-trained on large amounts of data and transferred to multiple recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc), Vision Transformer attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.


虽然Transformer 已经成为NPL任务的标准配置,它在计算机视觉领域的应用依然是有限的。在计算机视觉领域,注意力机制可以与卷积网络一起使用,也可以用于替换网络中指定的部分而不改变整体网络结构。我们在本文中展示了一种不使用CNNs而是使用全transformer的架构,这种架构可以通过输入一系列的图像patch完成图像分类任务。

One-Shot Learning for Deformable Medical Image Registration and Periodic Motion Tracking



Deformable image registration is a very important field of research in medical imaging. Recently multiple deep learning approaches were published in this area showing promising results. However, drawbacks of deep learning methods are the need for a large amount of training datasets and their inability to register unseen images different from the training datasets. One shot learning comes without the need of large training datasets and has already been proven to be applicable to 3D data. In this work we present a one shot registration approach for periodic motion tracking in 3D and 4D datasets. When applied to a 3D dataset the algorithm calculates the inverse of the registration vector field simultaneously. For registration we employed a U-Net combined with a coarse to fine approach and a differential spatial transformer module. The algorithm was thoroughly tested with multiple 4D and 3D datasets publicly available. The results show that the presented approach is able to track periodic motion and to yield a competitive registration accuracy. Possible applications are the use as a stand-alone algorithm for 3D and 4D motion tracking or in the beginning of studies until enough datasets for a separate training phase are available.

可变形的图像配准是医学图像中非常重要的研究领域。最近该领域发表的很多深度学习方法展示了显著的效果。然而,需要大量的训练数据以及无法配准在训练数据集中从未见过的的图像仍然是深度学习的缺陷。One shot学习不需要大规模的训练数据集并且已经被正式可应用于3D数据集。在这份工作中,我们展示了一个one shot配准方法用于跟踪3D和4D数据集的周期性运动。当应用于3D数据集,该算法同时计算配准向量场的逆运算。对于配准,我们应用结合从粗到细的U-Net方法以及一个差分空间转换模型。算法完全在多个4D和3D公开数据集上进行测试。结果表明提出的方法能够跟踪周期运动并且实现可竞争的配准精确度。可单独应用于3D和4D运动跟踪或者在研究的初始阶段直到有足够的数据集可用于单独的训练。

AutoAugment: Learning Augmentation Strategies from Data

Data augmentation is an effective technique for improving the accuracy of modern image classifiers. However, current data augmentation implementations are manually designed. In this paper, we describe a simple procedure called AutoAugment to automatically search for improved data augmentation policies. In our implementation, we have designed a search space where a policy consists of many sub- policies, one of which is randomly chosen for each image in each mini-batch. A sub-policy consists of two operations, each operation being an image processing function such as translation, rotation, or shearing, and the probabilities and magnitudes with which the functions are applied. We use a search algorithm to find the best policy such that the neural network yields the highest validation accuracy on a target dataset. Our method achieves state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, and ImageNet (without additional data). On ImageNet, we attain a Top-1 accuracy of 83.5% which is 0.4% better than the previous record of 83.1%. On CIFAR-10, we achieve an error rate of 1.5%, which is 0.6% better than the previous state-of-the-art. Augmentation policies we find are transferable between datasets. The policy learned on ImageNet transfers well to achieve significant improvements on other datasets, such as Oxford Flowers, Caltech-101, Oxford-IIT Pets, FGVC Air- craft, and Stanford Cars.



Interpretable and Accurate Fine-grained Recognition via Region Grouping

Interpretable and Accurate Fine-grained Recognition via Region Grouping

We present an interpretable deep model for fine-grained visual recognition. At the core of our method lies the integration of region-based part discovery and attribution within a deep neural network. Our model is trained using image-level object labels, and provides an interpretation of its results via the segmentation of object parts and the identification of their contributions towards classification. To facilitate the learning of object parts without direct supervision, we explore a simple prior of the occurrence of object parts. We demonstrate that this prior, when combined with our region-based part discovery and attribution, leads to an interpretable model that remains highly accurate. Our model is evaluated on major finegrained recognition datasets, including CUB-200 [56], CelebA [36] and iNaturalist [55]. Our results compare favorably to state-of-the-art methods on classification tasks, and our method outperforms previous approaches on the localization of object parts.