![]() These unique properties offer great potential for low-latency object detection and tracking in time-critical scenarios. We present Recurrent Vision Transformers (RVTs), a novel backbone for object detection with event cameras.Įvent cameras provide visual information with sub-millisecond latency at a high-dynamic range and with strong robustness against motion blur. This work thus opens a new unexplored field of explicit representation optimization that will push the limits of event-based learning methods. We even outperform the state-of-the-art by 1.8% mAP on Gen1 and state-of-the-art feed-forward methods by 6.0% mAP on the 1 Mpx dataset. With it, we outperform state-of-the-art representations by 1.9% mAP on the 1 Mpx dataset and 8.6% mAP on the Gen1 dataset. We then used this metric to, for the first time, optimize over a large family of representations, revealing a new, powerful representation, ERGO-12. We validated extensively on multiple datasets and neural network backbones that the performance of neural networks trained with a representation perfectly correlates with its GWD. In this work, we circumvent this bottleneck by measuring the quality of event representations with the Gromov Wasserstein Discrepancy (GWD), which is 200 times faster to compute. However, selecting this representation is very expensive, since it requires training a separate neural network for each representation and comparing the validation scores. ![]() State-of-the-art event-based deep learning methods typically need to convert raw events into dense input representations before they can be processed by standard networks. We also show that combining events and frames can overcome failure cases of NeRF estimation in scenarios where only a few input views are available without requiring additional regularization. Furthermore, by combining events and frames, we can estimate NeRFs of higher quality than state-of-the-art approaches under severe motion blur. ![]() We show that rendering high-quality frames is possible by only providing an event stream as input. Our method can recover NeRFs during very fast motion and in high-dynamic-range conditions where frame-based approaches fail. To alleviate these problems, we present E-NeRF, the first method which estimates a volumetric scene representation in the form of a NeRF from a fast-moving event camera. This can cause significant problems for downstream tasks such as navigation, inspection, or visualization of the scene. These assumptions are often violated in robotic applications, where images may contain motion blur, and the scene may not have suitable illumination. Most approaches assume optimal illumination and slow camera motion. ![]() Images, IMU, ground truth, synthetic data, as well as an event-camera simulator!Įvent-based vision resources, which we started to collect information about this excitingĮstimating neural radiance fields (NeRFs) from "ideal" images has been extensively studied in the computer vision community. Our tutorial on event cameras ( PDF, PPT),Īnd our event-camera dataset, which also includes intensity Temporal resolution and the asynchronous nature of the sensor are required.ĭo you want to know more about event cameras or play with them? Images, traditional vision algorithms cannot be applied, so that new algorithms that exploit the high However,īecause the output is composed of a sequence of asynchronous events rather than actual intensity They offer significant advantagesĪ very high dynamic range, no motion blur, and a latency in the order of microseconds. Pixel-level brightness changes instead of standard intensity frames. Event-based Vision, Event Cameras, Event Camera SLAMĮvent cameras, such as the Dynamic Vision Sensor (DVS), are bio-inspired vision sensors that output ![]()
0 Comments
Leave a Reply. |