Robust 3D Object Detection from LiDAR Point Cloud Data with Spatial Information Aggregation
Authors: Guus Engels Ignacio Argandas
Date: 16.09.2020
Abstract
Current 3D object detectors from Bird’s Eye View (BEV) LiDAR point cloud data rely on Convolutional Neural Networks (CNNs), which have originally been designed for camera images. Therefore, they look for the same target features, regardless of the position of the objects with respect to the sensor. Discarding this spatial information makes 3D object detection unreliable and not robust, because objects in LiDAR point clouds contain distance dependent features. The position of a group of points can be decisive to know if they represent an object or not. To solve this, we propose a network extension called FeatExt operation that enables the model to be aware of both the target objects features and their spatial location. FeatExt operation expands a group of feature maps extracted from a BEV representation to include the distance to a specific position of interest in the scene, in this case the distance with respect to the LiDAR. When adding the proposed operation to a baseline network in an intermediate fusion fashion, it shows up to an 8.9 average precision boost in the KITTI BEV benchmark. Our proposal can be easily added to improve existing object detection networks.
BIB_text
title = {Robust 3D Object Detection from LiDAR Point Cloud Data with Spatial Information Aggregation},
pages = {813-823},
keywds = {
3D object detection \and LiDAR \and feature extraction
}
abstract = {
Current 3D object detectors from Bird’s Eye View (BEV) LiDAR point cloud data rely on Convolutional Neural Networks (CNNs), which have originally been designed for camera images. Therefore, they look for the same target features, regardless of the position of the objects with respect to the sensor. Discarding this spatial information makes 3D object detection unreliable and not robust, because objects in LiDAR point clouds contain distance dependent features. The position of a group of points can be decisive to know if they represent an object or not. To solve this, we propose a network extension called FeatExt operation that enables the model to be aware of both the target objects features and their spatial location. FeatExt operation expands a group of feature maps extracted from a BEV representation to include the distance to a specific position of interest in the scene, in this case the distance with respect to the LiDAR. When adding the proposed operation to a baseline network in an intermediate fusion fashion, it shows up to an 8.9 average precision boost in the KITTI BEV benchmark. Our proposal can be easily added to improve existing object detection networks.
}
isbn = { 978-303057801-5},
date = {2020-09-16},
}