Navigate back to the homepage

Early Sensor Fusion and Object Detection for Self-Driving Cars

Xavier Rigoulet
August 31st, 2022 · 5 min read

Discover how a car can assess the distance with another vehicle in real-time.

In a previous article, I explained the process of projecting a 3D point cloud onto a 2D image.

In this article, we will go further by performing object detection with a deep learning model. In the end, it will have the distance between our car and the other vehicles in real-time.

This technology is essential in the perception pipeline of autonomous vehicles. If the algorithm can detect the distance that separates us from an obstacle in front, it can adapt its speed and make appropriate decisions, such as an emergency brake if needed.

You can watch the final result on Youtube.

Let’s get right into it! But before, let me provide some context about sensor fusions. There are different types of sensor fusion.

Combining data from a camera with a or any other sensor is called low-level sensor fusion. This is because the fusion is on raw data before performing object detection.

If we finally perform sensor fusion after object detection on the camera and the LiDAR, we talk about mid-level sensor fusion.

Finally, if we have object tracking on both devices, we talk about high-level sensor fusion.

In this article, we talk about low-level fusion, which is also called early sensor fusion. Mid-level and high-level sensor fusions are what we call late sensor fusion.

In Late sensor fusion, we perform 2D object detection from the image and 3D object detection from LiDAR before fusing the sensors.

If early sensor fusion is about raw data, late sensor fusion is about objects.

It is important to note that early sensor fusion is nowadays preferred because it is safer. With early sensor fusion, we can build a security bubble, meaning that if the algorithm fails to detect the object, we can still stop the car.

On the other hand, late sensor fusion relies on object detection; therefore, if the system fails to detect the object, the whole system crashes.

To complete the process of early sensor fusion, we need to perform the three following steps:

  1. Project 3D point cloud onto 2D images
  2. 2D Object Detection
  3. Outlier removal

Since I already explained the first step of early fusion, I will only go through steps 2 and 3.

However, if you are not familiar with it, I recommend reading the article first before coming back here.

Data Aggregation

The data used for this project have been collected by a real car, which is equipped with a rotating LiDAR and a stereo camera system with four cameras, as seen below:

Wolkswagen Passat
Passat diagram top view

The car is a modified Volkswagen Passat B6 with the following sensors:

  • 1 Inertial Navigation System (GPS/IMU): OXTS RT 3003
  • 1 Laserscanner: Velodyne HDL-64E
  • 2 Grayscale cameras, 1.4 Megapixels: Point Grey Flea 2 (FL2–14S3M-C)
  • 2 Color cameras, 1.4 Megapixels: Point Grey Flea 2 (FL2–14S3C-C)
  • 4 Varifocal lenses, 4–8 mm: Edmund Optics NT59–917

You can find more information about the data and the car used to collect it on the KITTI Vision Benchmark Suite website.

3D point cloud projection onto a 2D image

As I mentioned earlier, I will not detail how to project a 3D point cloud onto a 2D image, but I believe it is necessary to show the final output before going further.

3D point cloud to 2D image

The image above shows the 3D point clouds on the 2D image. The color of the 3D points changes depending on the distance between the point and our sensor. This step is essential because we reduce the uncertainties generated by each sensors’ limitations by combining different sensors.

If you are interested in learning more about point cloud data, and in particular how to use neural networks with 3D data, I have written an article about point cloud and neural networks.

Next, we perform 2D object detection.

2D Object Detection with Deep Learning

Object detectors can be divided into two categories of algorithms: Region Proposal Detectors and One-Shot Detectors.

In a Region Proposal Network such as R-CNN, the network will propose 2000 regions possibly containing an object. Then, we compute the convolutional features using CNN for each proposal. Finally, each proposal is classified with the help of a linear SVM.

On the other hand, One-Shot Detectors do not need region proposals and directly regress bounding box location. YOLO is an example of a One-Shot Detector and is our algorithm of choice for this project.

YOLO stands for You Only Look Once. This algorithm works by dividing an image into a grid, which is usually of dimensions 13x13. For each cell, the model will predict two boxes with their confidence interval. Then, the model will predict a class probability for each cell and combine the boxes with the class predictions. The final step is to apply Non-Maxima Suppression and Thresholding.


The YOLO architecture is a Deep Learning model with 24 convolutional layers.

YOLO architecture

For more details, you can read the research paper and access the code.

Here is the result of the object detection:

2D object detection

Once we have the 2D object detection on the image, we need to fuse the point cloud and the bounding boxes.

Fuse Point Cloud and Bounding Boxes

At this stage, when we fuse the point cloud with the bounding boxes, here is our result:

fusion point cloud and bounding boxes

Now, we need to remove the irrelevant points.

The first step is to remove the points outside the bounding boxes with a simple if-else statement.

Next, we remove the points inside the bounding boxes but are irrelevant because these points are sources of errors.

This process is called outliers removal. The outliers are the points belonging to the box but are not part of the object.

outliers inside bounding box

This step is essential to ensure accuracy in estimating the distance between the obstacle and our vehicle.

There are several methods to perform it. A standard solution is to use a shrink factor. In other words, instead of considering the whole box, we consider only a part of it.

A common practice is to remove 10 to 15% of the points to keep only relevant ones.

original bounding box

The image above shows that while the points are inside the bounding box, not all the points belong to the detected object.

bounding box shrinked by 10%

After shrinking the box by 10%, it helps to reduce the number of outliers, but we still have points that do not belong to the object.

bounding box shrinked by 20%

Shrinking the size of the bounding box by 20% improves the result, but we still need to optimize further as unwanted points remain.

We can reduce their number further by using the sigma rule. The sigma rule removes outliers based on the number of standard deviations sigma.

normal distribution

In this case, a “1-Sigma” is one standard deviation from the norm (i.g., mean or average), “2-sigma” is two standard deviations from the norm, and “3-sigma” represents three standard deviations from the norm.

The final step is to select a distance between the bounding boxes and the points as a reference.


Choosing the median, the average, the closest, or even a random point is possible. It is usually safer to select the closest point from the bounding box.

Finally, here is the final result:

early sensor fusion final result

Closing Thoughts on Early Sensor-Fusion for Self-Driving Cars

This article and the previous one explained the complete process of early sensor fusion for self-driving cars.

We have learned the complete process of early sensor fusion, in this case, between a LiDAR and a camera. Once the 3D point cloud was projected onto the 2D image, we learned to perform 2D object detection using YOLO. Finally, we learned how to remove the outliers to ensure the accuracy of the output.

While this article illustrates the application of this technology with LiDAR and cameras, it also works with other sensors, such as cameras and a RADAR. This technology finds a lot of applications in robotics, drones, augmented reality, etc.

In the following article, I will go through late sensor fusion and 3D deep learning. To stay in touch with my future articles, feel free to join my newsletter and to connect with me on LinkedIn.

If you enjoyed reading please share...

Join our email list and get notified about new content

Be the first to receive our latest content with the ability to opt-out at anytime. We promise to not spam your inbox or share your email with any third parties.

More articles from Digital Nuage

The Cost of Technical Debts in Machine Learning

Discover what is a technical debt in Machine Learning and some mitigation strategies

August 31st, 2022 · 9 min read

Early Sensor Fusion For Autonomous Vehicles

How to Do Sensor Fusion For Autonomous Vehicles

August 28th, 2022 · 7 min read
© 2021–2022 Digital Nuage
Link to $