NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding

Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding. [The dataset is available at: http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp]

https://arxiv.org/abs/1905.04757

基于深度的人体行为分析取得了可喜的进展,这种方式也证明了3D表示在动作识别任务中的效果。现有的深度或者RGB+D的数据集或多或少都有些缺点,尤其是缺乏大规模的训练样本,大规模的类别,多重视角,多变环境条件以及实验人员限制等等。在本文中,我们提出了一个大规模的RGB+D的动作识别数据集,拥有106中单独的个体,超过114,000个视频,超过800万帧图像。这个数据集还包括了120种不一样的动作种类,包括日常动作、互动动作以及健康状态相关的动作。我们在这个数据集上测试了一系列现有的3D动作分析模型并且取得了深度学习在3D动作分析领域上具有优势的结论。另外我们还在数据集上验证了one-shot动作分析的任务并且取得了良好的效果。我们相信这个大规模数据集能够为社区所用,并且解决数据饥饿的问题。

发表评论

邮箱地址不会被公开。 必填项已用*标注