Deep Attention Network for Egocentric Action Recognition

Deep Attention Network for Egocentric Action Recognition | Semantic Scholar

Recognizing a camera wearer’s actions from videos captured by an egocentric camera is a challenging task. In this paper, we employ a two-stream deep neural network composed of an appearance-based stream and a motion-based stream to recognize egocentric actions. Based on the insight that human action and gaze behavior are highly coordinated in object manipulation tasks, we propose a spatial attention network to predict human gaze in the form of attention map. The attention map helps each of the two streams to focus on the most relevant spatial region of the video frames to predict actions. To better model the temporal structure of the videos, a temporal network is proposed. The temporal network incorporates bi-directional long short-term memory to model the long-range dependencies to recognize egocentric actions. The experimental results demonstrate that our method is able to predict attention maps that are consistent with human attention and achieve competitive action recognition performance with the state-of-the-art methods on the GTEA Gaze and GTEA Gaze+ datasets.

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8653357

本文提出了一个孪生网络,通过学习动作和外观信息从而识别特定动作。动机是人类的动作和凝视行为在对物品操作行为的时候是相互配合的,我们提出一个空间注意力网络去预测一个人类凝视行为的注意力图谱,这个注意力图谱可以帮助孪生网络将注意力集中到最相关的空间区域。为了更加好地建模时域模型,本文提出了一个时域网络,网络使用双向lstm架构来建模长时间的依赖用于识别动作。实验结果表明本方法可以有效地建立注意力图谱并且获得了具有竞争力的性能。

发表评论

邮箱地址不会被公开。 必填项已用*标注