Jens Behley
Dr. Jens Behley
Postdoctoral Researcher
Rheinische Friedrich-Wilhelms-Universität Bonn
Institute of Geodesy and Geoinformation
Nussallee 15
D-53115 Bonn, Germany
Office: 1.008
Phone: +49 (0)228 / 73-60190


A. Milioto, I. Vizzo, J. Behley, and C. Stachniss. RangeNet++: Fast and Accurate LiDAR Semantic Segmentation, Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019.
AbstractPerception in autonomous vehicles is often carried out through a suite of different sensing modalities. Given the massive amount of openly available labeled RGB data and the advent of high-quality deep learning algorithms for image-based recognition, high-level semantic perception tasksare pre-dominantly solved using high-resolution cameras. As a result of that, other sensor modalities potentially useful for this task are often ignored. In this paper, we push the state of the art in LiDAR-only semantic segmentation forward in order to provide another independent source of semantic information to the vehicle. Our approach can accurately perform full semantic segmentation of LiDAR point clouds at sensor frame rate. We exploit range images as an intermediate representation in combination with a Convolutional Neural Network (CNN) exploiting the rotating LiDAR sensor model. To obtain accurate results, we propose a novel post-processing algorithm that deals with problems arising from this intermediate representation such as discretization errors and blurry CNN outputs. We implemented and thoroughly evaluated our approach including several comparisons to thestate of the art. Our experiments show that our approachoutperforms state-of-the-art approaches, while still running online on a single embedded GPU. The code can be accessed at
X. Chen, A. Milioto, E. Palazzolo, P. Giguère, J. Behley, and C. Stachniss. SuMa++: Efficient LiDAR-based Semantic SLAM, Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019.
AbstractReliable and accurate localization and mapping are key components of most autonomous systems. Besides geometric information about the mapped environment, the semantics plays an important role to enable intelligent navigation behaviors. In most realistic environments, this task is particularly complicated due to dynamics caused by moving objects, which can corrupt the mapping step or derail localization. In this paper, we propose an extension of a recently published surfel-based mapping approach exploiting three-dimensional laser range scans by integrating semantic information to facilitate the mapping process. The semantic information is efficiently extracted by a fully convolutional neural network and rendered on a spherical projection of the laser range data. This computed semantic segmentation results in point-wise labels for the whole scan, allowing us to build a semantically-enriched map with labeled surfels. This semantic map enables us to reliably filter moving objects, but also improve the projective scan matching via semantic constraints. Our experimental evaluationon challenging highways sequences from KITTI dataset with very few static structures and a large amount of moving cars shows the advantage of our semantic SLAM approach in comparison to a purely geometric, state-of-the-art approach.
E. Palazzolo, J. Behley, P. Lottes, P. Giguère, and C. Stachniss. ReFusion: 3D Reconstruction in Dynamic Environments for RGB-D Cameras Exploiting Residuals, Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019.
AbstractMapping and localization are essential capabilities of robotic systems. Although the majority of mapping systems focus on static environments, the deployment in real-world situations requires them to handle dynamic objects. In this paper, we propose an approach for an RGB-D sensor that is able toconsistently map scenes containing multiple dynamic elements. For localization and mapping, we employ an efficient direct tracking on the truncated signed distance function (TSDF) andleverage color information encoded in the TSDF to estimate thepose of the sensor. The TSDF is efficiently represented using voxel hashing, with most computations parallelized on a GPU. For detecting dynamics, we exploit the residuals obtained afteran initial registration, together with the explicit modeling of free space in the model. We evaluate our approach on existing datasets, and provide a new dataset showing highly dynamic scenes. These experiments show that our approach often surpass other state-of-the-art dense SLAM methods. We make available our dataset with the ground truth for both the trajectory of the RGB-D sensor obtained by a motion capture system and the model of the static environment using a high-precision terrestrial laser scanner. Finally, we release our approach as open source code.
J. Behley*, M. Garbade*, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences, Proc. of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
AbstractSemantic scene understanding is important for various applications. In particular, self-driving cars need a fine-grained understanding of the surfaces and objects in their vicinity. Light detection and ranging (LiDAR) provides precise geometric information about the environment and is thus a part of the sensor suites of almost all self-driving cars. Despite the relevance of semantic scene understanding for this application, there is a lack of a large dataset for this task which is based on an automotive LiDAR. In this paper, we introduce a large dataset to propel research on laser-based semantic segmentation. We annotated all sequences of the KITTI Vision Odometry Benchmark and provide dense point-wise annotations for the complete 360 degree field-of-view of the employed automotive LiDAR. We propose three benchmark tasks based on this dataset:(i) semantic segmentation of point clouds using a single scan, (ii) semantic segmentation using multiple past scans, and (iii) semantic scene completion, which requires to anticipate the semantic scene in the future. We provide baseline experiments and show that there is a need for more sophisticated models to efficiently tackle these tasks. Our dataset opens the door for the development of more advanced methods, but also provides plentiful data to investigate new research directions.
(* indicates equal contribution)
P. Lottes, J. Behley, N. Chebrolu, A. Milioto, and C. Stachniss. Robust joint stem detection and crop‐weed classification using image sequences for plant‐specific treatment in precision farming, Journal of Field Robotics (JFR), 2019.
AbstractConventional farming still relies on large quantities of agrochemicals for weed management which have several negative side-effects on the environment. Autonomous robots offer the potential to reduce the amount of chemicals applied, as robots can monitor and treat eachplant in the field individually and thereby circumventing the uniform chemical treatment of the whole field. Such agricultural robots need the ability to identify individual crops and weeds in the field using sensor data and must additionally select effective treatment methods based on the type of weed. For example, certain types of weeds can only be effectively treated mechanically due to their resistance to herbicides, whereas other types can be treated trough selective spraying. In this article, we present a novel system that provides the necessary information for effective plant-specific treatment. It estimates the stem location for weeds, which enables the robots to perform precise mechanical treatment, and at the same time provides the pixel-accurate area covered by weeds for treatment through selective spraying. The major challenge in developing such a system is the large variability in the visual appearance that occurs in different fields. Thus, an effective classification system has to robustly handle substantial environmental changes including varying weed pressure, various weed types, different growth stages, changing visual appearance of the plants and the soil. Our approach uses an end-to-end trainable fully convolutional network that simultaneously estimates plant stem positions as well as the spatial extent of crop plants and weeds. It jointly learns how to detect the stems and the pixel-wise semantic segmentation and incorporates spatial information by considering image sequences of local field strips. The jointly learned feature representation for both tasks furthermore exploits the crop arrangement information that is often present in crop fields. This information is considered even if it is only observable from the image sequences and not a single image. Such image sequences, as typically provided by robots navigating over the field along crop rows, enable our approachto robustly estimate the semantic segmentation and stem positions despite the large variations encountered in different fields. We implemented and thoroughly tested our approachon images from multiple farms in different countries. The experiments show that our system generalizes well to previously unseen fields under varying environmental conditions — a key capability to deploy such systems in the real world. Compared to state-of-the-art approaches, our approach generalizes well to unseen fields and not only substantially improves the stem detection accuracy, i.e., distinguishing crop and weed stems, but also improves the semantic segmentation performance.
P. Lottes, J. Behley, A. Milioto, and C. Stachniss. Fully convolutional networks with sequential information for robust crop and weed detection in precision farming, IEEE Robotics and Automation Letters (RA-L), vol. 3, pp. 3097-3104, 2018.
AbstractReducing the use of agrochemicals is an important component towards sustainable agriculture. Robots that can perform targeted weed control offer the potential to contributeto this goal, for example, through specialized weeding actions such as selective spraying or mechanical weed removal. A prerequisite of such systems is a reliable and robust plant classification system that is able to distinguish crop and weedin the field. A major challenge in this context is the fact that different fields show a large variability. Thus, classification systems have to robustly cope with substantial environmental changes with respect to weed pressure and weed types, growth stages of the crop, visual appearance, and soil conditions. In this paper, we propose a novel crop-weed classification system that relies on a fully convolutional network with an encoder-decoder structure and incorporates spatial information by considering image sequences. Exploiting the crop arrangement information that is observable from the image sequences enables our system to robustly estimate a pixel-wise labeling ofthe images into crop and weed, i.e., a semantic segmentation. We provide a thorough experimental evaluation, which shows that our system generalizes well to previously unseen fields under varying environmental conditions — a key capability to actually use such systems in precision framing. We provide comparisons to other state-of-the-art approaches and show that our system substantially improves the accuracy of crop-weed classification without requiring a retraining of the model.
P. Lottes, J. Behley, N. Chebrolu, A. Milioto, and C. Stachniss. Joint stem detection and crop-weed classification for plant-specific treatment in precision farming, Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018.
AbstractApplying agrochemicals is the default procedure for conventional weed control in crop production, but hasnegative impacts on the environment. Robots have the potentialto treat every plant in the field individually and thus canreduce the required use of such chemicals. To achieve that, robots need the ability to identify crops and weeds in the fieldand must additionally select effective treatments. While certaintypes of weed can be treated mechanically, other types need to be treated by (selective) spraying. In this paper, we present an approach that provides the necessary information for effective plant-specific treatment. It outputs the stem location for weeds, which allows for mechanical treatments, and the covered areaof the weed for selective spraying. Our approach uses an end-to-end trainable fully convolutional network that simultaneouslyestimates stem positions as well as the covered area of cropsand weeds. It jointly learns the class-wise stem detection andthe pixel-wise semantic segmentation. Experimental evaluationson different real-world datasets show that our approach isable to reliably solve this problem. Compared to state-of-the-art approaches, our approach not only substantially improves the stem detection accuracy, i.e., distinguishing crop and weed stems, but also provides an improvement in the semantic segmentation performance.
J. Behley, C. Stachnis. Efficient Surfel-Based SLAM using 3D Laser Range Data in Urban Environments, Proc. of Robotics: Science and Systems (RSS), 2018.
AbstractAccurate and reliable localization and mapping is a fundamental building block for most autonomous robots. For this purpose, we propose a novel, dense approach to laser-based mapping that operates on three-dimensional point clouds obtained from rotating laser sensors. We construct a surfel-based map and estimate the changes in the robot's pose by exploiting the projective data association between the current scan and a rendered model view from that surfel map. For detection and verification of a loop closure, we leverage the map representation to compose a virtual view of the map before a potential loop closure, which enables a more robust detection even with low overlap between the scan and the already mapped areas. Our approach is efficient and enables real-time capable registration. At the same time, it is able to detect loop closures and to perform map updates in an online fashion. Our experiments show that we are able to estimate globally consistent maps in large scale environments solely based on point cloud data.
J. Behley, V. Steinhage, A.B. Cremers. Efficient Radius Neighbor Search in Three-dimensional Point Clouds, Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2015.
AbstractFinding all neighbors of a point inside a given radius is an integral part in many approaches using three-dimensional laser range data. We present novel insights to significantly improve the runtime performance of radius neighbor search using octrees. Our contributions are as follows: (1) We propose an index-based organization of the point cloud such that we can efficiently store start and end indexes of points inside every octant and (2) exploiting this representation, we can use pruning of irrelevant subtrees in the traversal to facilitate highly efficient radius neighbor search. We show significant runtime improvements of our proposed octree representation over state-of-the-art neighbor search implementations on three different urban datasets.
J. Behley. Three-dimensional Laser-based Classification in Outdoor Environments, Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn, 2014.
AbstractRobotics research strives for deploying autonomous systems in populated environments, such as inner city traffic. Autonomous cars need a reliable collision avoidance, but also an object recognition to distinguish different classes of traffic participants. For both tasks, fast three-dimensional laser range sensors generating multiple accurate laser range scans per second, each consisting of a vast number of laser points, are often employed. In this thesis, we investigate and develop classification algorithms that allow us to automatically assign semantic labels to laser scans. We mainly face two challenges: (1) we have to ensure consistent and correct classification results and (2) we must efficiently process a vast number of laser points per scan. In consideration of these challenges, we cover both stages of classification -- the feature extraction from laser range scans and the classification model that maps from the features to semantic labels.
As for the feature extraction, we contribute by thoroughly evaluating important state-of-the-art histogram descriptors. We investigate critical parameters of the descriptors and experimentally show for the first time that the classification performance can be significantly improved using a large support radius and a global reference frame.
As for learning the classification model, we contribute with new algorithms that improve the classification efficiency and accuracy. Our first approach aims at deriving a consistent point-wise interpretation of the whole laser range scan. By combining efficient similarity-preserving hashing and multiple linear classifiers, we considerably improve the consistency of label assignments, requiring only minimal computational overhead compared to a single linear classifier.
In the last part of the thesis, we aim at classifying objects represented by segments. We propose a novel hierarchical segmentation approach comprising multiple stages and a novel mixture classification model of multiple bag-of-words vocabularies. We demonstrate superior performance of both approaches compared to their single component counterparts using challenging real world datasets.
J. Behley, V. Steinhage, A.B. Cremers. Laser-based Segment Classification Using a Mixture of Bag-of-Words, Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4195–4200, 2013.
AbstractIn this paper, we propose a segment-based object detection approach using laser range data. Our detection approach is built up of three stages: First, a hierarchical segmentation approach generates a hierarchy of coarse-to-fine segments to reduce the impact of over- and under-segmentation in later stages. Next, we employ a learned mixture model to classify all segments. The model combines multiple softmax regression classifiers learned on specific bag-of-word representations using different parameterizations of a descriptor. In the final stage, we filter irrelevant and duplicate detections using a greedy method in consideration of the segment hierarchy. We experimentally evaluate our approach on recently published real-world datasets to detect pedestrians, cars, and cyclists.
V. Steinhage, J. Behley, S. Meisel, A.B. Cremers. Reconstruction by components for automated updating of 3D city models, Applied Geomatics, Springer, 2013.
Abstract3D city models are of interest for various reasons like urban planning, environmental simulations of urban climate and noise pollution, disaster simulations, virtual tourism, virtual-heritage conservation, etc. To create and update large-scale 3D city models efficiently, automated approaches to 3D reconstruction are in great demand. Aside from efficiency, reliability and flexibility are of crucial importance. The derived reconstruction results should be reliable in that they correspond to the observed buildings in both their geometry and their structural topology. Flexibility should ensure the derivation of 3D reconstructions for the most common urban building structures without being limited in descriptive power to only some specific building types. To ensure efficiency, reliability, and flexibility of automated 3D building reconstruction, we propose an approach that combines two paradigms. First, we employ the fusion of information derived from different sensors and map data from a geographic information system. Second, we employ a semantic and component-based approach to model and reconstruct complex buildings. The derived geometrical and semantic building description is utilized within a spatial information system to support spatial and semantic queries for the maintenance and updating of the derived 3D city models.
J. Behley, V. Steinhage, A.B. Cremers. Performance of Histogram Descriptors for the Classification of 3D Laser Range Data in Urban Environments, Proc. of the IEEE International Conference on Robotics and Automation (ICRA), pp. 4391–4398, 2012.
Abstract The selection of suitable features and their parameters for the classification of three-dimensional laser range data is a crucial issue for high-quality results. In this paper we compare the performance of different histogram descriptors and their parameters on three urban datasets recorded with various sensors — sweeping SICK lasers, tilting SICK lasers and a Velodyne 3D laser range scanner. These descriptors are 1D, 2D, and 3D histograms capturing the distribution of normals or points around a query point. We also propose a novel histogram descriptor, which relies on the spectral values in different scales. We argue that choosing a larger support radius and a z-axis based global reference frame/axis can boost the performance of all kinds of investigated classification models significantly. The 3D histograms relying on the point distribution, normal orientations, or spectral values, turned out to be the best choice for the classification in urban environments.
Errata — In section 2 there were two errors in the histogram index derivation. (1) The spin image index j must be calculated as floor(1/rho 0.5 (delta + beta)). (2) The distribution histogram must be calculated as floor(b/2 (q'/delta + 1)). In the linked version (draft), these errors are corrected.
F. Schöler, J. Behley, V. Steinhage, D. Schulz, A.B. Cremers. Person Tracking in Three-Dimensional Laser Range Data with Explicit Occlusion Adaption, Proc. of the IEEE International Conference on Robotics and Automation (ICRA), pp. 1297–1303, 2011.
Abstract This paper presents an approach to exploit the richer information of sensor data provided by 3d laser rangefinders for the purpose of person tracking. Introduced is a method to adapt the observation model of a particle filter, to identify partial and full occlusions of a person, to determine the amount of occlusion behind an obstacle, and the occluding obstacle itself. This is done by tracing rays from positions near the person to the sensor and determining whether the ray hits an obstacle. The laser range data is represented using a voxel grid, which facilitates efficient retrieval and data reduction. As our experiments show, our proposed tracking approach is able to reliably keep track of a person in real-time, even when only partially visible, when moving in uneven terrain, or when the person passes closely another person of different size.
J. Behley, K. Kersting, D. Schulz, V. Steinhage, A.B. Cremers. Learning to Hash Logistic Regression for Fast 3D Scan Point Classification, Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5960–5965, 2010.
Abstract Segmenting range data into semantic categories has become a more and more active field of research in robotics. In this paper, we advocate to view this task as a problem of fast, large-scale retrieval. Intuitively, given a dataset of millions of labeled scan points and their neighborhoods, we simply search for similar points in the datasets and use the labels of the retrieved ones to predict the labels of a novel point using some local prediction model such as majority vote or logistic regression. However, actually carrying this out requires highly efficient ways of (1) storing millions of scan points in memory and (2) quickly finding similar scan points to a target scan point. In this paper, we propose to address both issues by employing Weiss et al.'s recent spectral hashing. It represents each item in a database by a compact binary code that is constructed so that similar items will have similar binary code words. In turn, similar neighbors have codes within a small Hamming distance of the code for the query. Then, we learn a logistic regression model locally over all points with the same binary code word. Our experiments on real world 3D scans show that the resulting approach, called spectrally hashed logistic regression, can be ultra fast at prediction time and outperforms state-of-the art approaches such as logistic regression and nearest neighbor.
V. Steinhage, J. Behley, S. Meisel, A. B. Cremers. Automated Updating and Maintenance of 3D City Models, ISPRS-Workshop on Core Spatial Databases - Updating, Maintenance and Services, 2010.
Abstract The automation of 3D building reconstruction is an ongoing topic of worldwide research. Currently, cities all over the world are heavily engaged to create 3D city models for various reasons like town planning, urban climate and noise simulations, virtual tourism etc. Generally, companies are entrusted to generate such 3D city models. The current approaches of those companies are labour intensive and therefore cost intensive, since the reconstruction of each single building is based mainly on manual processing within a computer supported editing framework. The same holds for internet providers of 3D city models which offer tools for the interactive construction of 3D models (e.g. the SketchUp tool of Google Earth). In cooperation with the land registry and surveying office of the city of Bonn we developed an automated approach to 3D building reconstruction as well as a spatial information system for the maintenance of 3D city models. This contribution focuses on our approach on 3D building reconstruction which employs a model-based data fusion from aerial images, airborne laser scanning and GIS. Furthermore, we describe our spatial information system to maintain the 3D city models. The spatial information system is based on open source RDBMS and offers SQL-based spatial query functionality.
J. Behley, V. Steinhage. Generation of 3D City Models Using Domain-Specific Information Fusion, Proc. of the International Conference on Computer Vision Systems (ICVS), pp. 164–173, 2009.
Abstract In this contribution we present a building reconstruction strategy using spatial models of building parts and information fusion of aerial image, digital surface model and ground plans. The fusion of sensor data aims to derive reliably local building features and is therefore controlled in a domain specific way: ground plans indicate the approximate location of outer roof corners and the intersection of planes from the digital surface model yields the inner roof corners. Parameterized building parts are selected using these corners and afterwards combined to form complete three-dimensional building models. We focus here on the domain specific information fusion and present results on a sub-urban dataset.
V. Steinhage, J. Behley. Model-Driven Generation of 3D City Models using Information Fusion on Aerial Imagery, Laser Altimeter and Map Data, GFaI-Workshop 3D-NordOst, 2008.


Summer Term 15: "Knowledge-based Image Understanding" (Master) together with PD Dr. Volker Steinhage and Dominik A. Klein
Summer Term 14: "Knowledge-based Image Understanding" (Master) together with PD Dr. Volker Steinhage Projectgroup "Intelligente Sehsystem" (Bachelor)
Summer Term 13: Projectgroup "Intelligente Sehsystem" (Bachelor)
Summer Term 12: Exercises "Autonomous Mobile Systems" (Master)
Winter Term 11/12: Projectgroup "Intelligente Sehsystem" (Bachelor)
Summer Term 11: Exercises "Autonomous Mobile Systems" (Master)
Winter Term 10/11: Projectgroup "Intelligente Sehsysteme (Bachelor)
Summer Term 10 Exercises "Autonomous Mobile Systems" (Master)
Winter Term 09/10: Projectgroup "Intelligente Sehsysteme" (Bachelor)