Efficient Surfel-Based SLAM using 3D Laser Range Data in Urban Environments
Jens Behley and Cyrill Stachniss

Abstract — Accurate and reliable localization and mapping is a fundamental building block for most autonomous robots. For this purpose, we propose a novel, dense approach to laser-based mapping that operates on three-dimensional point clouds obtained from rotating laser sensors. We construct a surfel-based map and estimate the changes in the robot's pose by exploiting the projective data association between the current scan and a rendered model view from that surfel map. For detection and verification of a loop closure, we leverage the map representation to compose a virtual view of the map before a potential loop closure, which enables a more robust detection even with low overlap between the scan and the already mapped areas. Our approach is efficient and enables real-time capable registration. At the same time, it is able to detect loop closures and to perform map updates in an online fashion. Our experiments show that we are able to estimate globally consistent maps in large scale environments solely based on point cloud data.

### Data

Here, we provide configuration and poses files for our implementation to reproduce the presented results. For the sensor poses, we follow the KITTI convention, i.e., every line in the file corresponds to the entries of a homogenous transformation $\mathbf{T} \in \mathbb{R}^{4 \times 4}$ in row-major format, where the last row $(0, 0, 0, 1)$ has been omitted.

Note, we provide only trajectories of the testset for our approach with loop closures. However, you can generate the trajectories yourself using the configuration for the other approaches.

Approach config poses
Frame-to-Frame
Frame-to-Model (no loop closure)
Frame-to-Model (with loop closure)

### Implementation Details

We implemented our approach using only OpenGL core profile functionality. This enables us to run our approach on all common GPUs (including Nvidia, AMD, but also Intel). However, this decision entails also some restrictions on how you can use the library.

1. Single thread: Despite using the GPU for fast computations, the approach must run in a single thread. Therefore, it is only possible to use the library in the same thread in which the OpenGL context was created. This is particularly imported with ROS, where callbacks are usually run in separate threads, i.e., the context initialization must happen inside the callback.
2. There must be a OpenGL context available. For linux, this means that a X server with a display available or must be "faked" via Xvfb. This is still needed even if no visual output is created.

With the restriction of only using OpenGL, we also had to find ways to perform some computations in the different shader stages. Fast computation of the terms $\mathbf{J^T W J}$ and $\mathbf{J^Tr}$ with plain OpenGL was the most challenging part. Here, we exploited blending, glEnable(GL_BLEND), to compute the sum via the geometry shader, which generates the corresponding pixel positions to fill the appropriate entries of the framebuffers output texture. For each vertex coming from the vertex shader, the geometry shader generates all $6\times 6 + 6 \times 1$ entries of the needed terms, which are then summed via blending in the fragment shader.

However, just naively passing each vertex from the vertex map through the geometry shader results in inferior performance of the overall computation. For a significant speedup, reducing the average computation time from $1.5\,ms$ to $0.3\,ms$, we compute partial sums inside the geometry shader. We suppose that this reduces the access conflicts of the blending operation, since multiple GPU threads try to access the same memory locations and therefore leads to serialization of the pixel operations in the fragmet shader.

The left figure shows the avgerage runtime for different number of partial sum entries in the geometry shader. Clearly visible is an optimal size of $64$ entries in the partial sum. Also visible is a significantly reduced maximal computation time. This improvement lead overall to a considerable reduction of the computation time per timestep and also helped to get near real-time performance of the loop closure detection and verification.

### Citation

@inproceedings{behley2018rss,
author = {Jens Behley and Cyrill Stachniss},
title  = {Efficient Surfel-Based SLAM using 3D Laser Range Data in Urban Environments},
booktitle = {Proc.~of Robotics: Science and Systems~(RSS)},
year = {2018}
}