Hierarchical Neural Memory Network for Low Latency Event Processing

Abstract

This paper proposes a low latency neural network architecture for event-based dense prediction tasks.

Conventional architectures encode entire scene contents at a fixed rate regardless of their temporal characteristics. Instead, the proposed network encodes contents at an adaptive rate depending on its movement speed. We achieve this by constructing temporal hierarchy using stacked latent memories that operate at different rates. Given low latency event steams, the multi-level memories gradually extract dynamic to static scene contents by propagating information from the fast to the slow memory modules. The architecture not only reduces the redundancy of conventional architectures but also exploits long-term dependencies. Furthermore, an attention-based event representation efficiently encodes sparse event streams into the memory cells.

We conduct extensive evaluations on three event-based dense prediction tasks, where the proposed approach outperforms the existing methods on accuracy and latency, while demonstrating effective event and image fusion capabilities.

Video

Hierarchical Neural Memory Network

Our idea for low latency event processing is a multi-rate network architecture. A scene often contains objects with varying motion speeds. A network should run fast for high-speed motions, but conduct careful or global reasoning for slowly moving objects or scene context analysis.

To achieve this, the proposed network builds a temporal hierarchy using multi-level latent memories $\{\boldsymbol{z}_1,\boldsymbol{z}_2,\boldsymbol{z}_3\}$ that operate in parallel at different rates.

The memories are stacked such that their operating rate decreases from $\boldsymbol{z}_1$ to $\boldsymbol{z}_L$. $\boldsymbol{z}_1$ writes incoming events into its state (event-write) and quickly extracts local and dynamic information with a shallow network ($F_u$). The features are then propagated to higher memories ($F_{w}^{\uparrow}$) where global and static information is extracted with deeper networks ($F_u$). The network also has a top-down path ($F_{w}^{\downarrow}$) that enables low-level memories to exploit the contextual information to recognize dynamic motion accurately. At the end of the operating cycle, each memory computes output features ($F_{ro}$) and puts them into a latent buffer. At every time step, the task head computes predictions from the features inside the latent buffer, exploiting low latency information and global context simultaneously.

Results

The proposed HMNet outperforms existing methods while reducing latency by 40%-50%.

Results on Semantic Segmentation (DSEC-Semantic dataset)

Results on Object Detection (GEN1 dataset)

Datasets

We appreciate the following works for releasing the event camera datasets used in our work:

DSEC-Semantic
Z. Sun, N. Messikommer, D. Gehrig, and D. Scaramuzza. ESS: Learning Event-based Semantic Segmentation from Still Images. ECCV, 2022.
M. Gehrig, W. Aarents, D. Gehrig, and D. Scaramuzza. DSEC: A Stereo Event Camera Dataset for Driving Scenarios. IEEE Robotics and Automation Letters, 2021.

GEN1
Pierre de Tournemire, Davide Nitti, Etienne Perot, and Amos Sironi. A Large Scale Event-based Detection Dataset for Automotive. CoRR, 2020.

EventScape
D. Gehrig, M. Rüegg, M. Gehrig, J. Hidalgo-Carrio and D. Scaramuzza. Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction. IEEE Robotics and Automation Letters, 2021.

MVSEC
A. Z. Zhu, D. Thakur, T. Ozaslan, B. Pfrommer, V. Kumar, and K. Daniilidis. The Multi Vehicle Stereo Event Camera Dataset: An Event Camera Dataset for 3D Perception. IEEE Robotics and Automation Letters, 2018.

Acknowledgement

This work is based on results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO).

BibTeX

@article{hamaguchi2023hmnet,
  author    = {Ryuhei Hamaguchi, Yasutaka Furukawa, Masaki Onishi, Ken Sakurada},
  title     = {Hierarchical Neural Memory Network for Low Latency Event Processing},
  journal   = {CVPR},
  year      = {2023},
}