标签归档:Survey Paper

Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision

Transformer architectures have brought about fundamental changes to computational linguistic field, which had been dominated by recurrent neural networks for many years. Its success also implies drastic changes in cross-modal tasks with language and vision, and many researchers have already tackled the issue. In this paper, we review some of the most critical milestones in the field, as well as overall trends on how transformer architecture has been incorporated into visuolinguistic cross-modal tasks. Furthermore, we discuss its current limitations and speculate upon some of the prospects that we find imminent.

https://arxiv.org/abs/2103.04037

Transformer架构为计算语言领域带来了根本性的变化,这改变了循环神经网络长期占据的局面。它的成功揭示了视觉和语言跨模态任务正在发生显著的改变,很多科研人员已经投入并正在解决上述问题。本文中我们回顾了一些这个领域的里程碑以及transformer在跨模态任务中的进化趋势。然后我们还讨论了transormer架构目前的缺陷以及对未来的展望。

A Survey on Visual Transformer

Transformer is a type of deep neural network mainly based on self-attention mechanism which is originally applied in natural language processing field. Inspired by the strong representation ability of transformer, researchers propose to extend transformer for computer vision tasks. Transformer-based models show competitive and even better performance on various visual benchmarks compared to other network types such as convolutional networks and recurrent networks. In this paper we provide a literature review of these visual transformer models by categorizing them in different tasks and analyze the advantages and disadvantages of these methods. In particular, the main categories include the basic image classification, high-level vision, low-level vision and video processing. Self-attention in computer vision is also briefly revisited as self-attention is the base component in transformer. Efficient transformer methods are included for pushing transformer into real applications. Finally, we give a discussion about the further research directions for visual transformer.

https://arxiv.org/abs/2012.12556

本文是视觉Transformer的综述文章。

Learning from Very Few Samples: A Survey

Few sample learning (FSL) is significant and challenging in the field of machine learning. The capability of learning and generalizing from very few samples successfully is a noticeable demarcation separating artificial intelligence and human intelligence since humans can readily establish their cognition to novelty from just a single or a handful of examples whereas machine learning algorithms typically entail hundreds or thousands of supervised samples to guarantee generalization ability. Despite the long history dated back to the early 2000s and the widespread attention in recent years with booming deep learning technologies, little surveys or reviews for FSL are available until now. In this context, we extensively review 200+ papers of FSL spanning from the 2000s to 2019 and provide a timely and comprehensive survey for FSL. In this survey, we review the evolution history as well as the current progress on FSL, categorize FSL approaches into the generative model based and discriminative model based kinds in principle, and emphasize particularly on the meta learning based FSL approaches. We also summarize several recently emerging extensional topics of FSL and review the latest advances on these topics. Furthermore, we highlight the important FSL applications covering many research hotspots in computer vision, natural language processing, audio and speech, reinforcement learning and robotic, data analysis, etc. Finally, we conclude the survey with a discussion on promising trends in the hope of providing guidance and insights to follow-up researches.

本文是少样本学习的综述文章。在机器学习领域,少样本学习任务是极难且具有挑战性的任务。在少样本条件下的学习和泛化能力一直是衡量人工智能相对于人类智能的发展水平的条件之一,因为人类可以在少样本的情况下快速地建立起来对于一个新事物的认识,但是人工智能模型往往需要成百上千的数据才能获得有保证的性能。从千禧年早期到最近,随着深度学习技术的发展,少样本学习获得了越来越多的关注,但是直到现在为止,综述的论文还是屈指可数。在本文中,我们广泛地阅读了大量的文献,对于少样本学习的历史和发展趋势做出了总结。