Can we estimate the distance to a vehicle with cameras only?
In fact, it is possible with a stereo camera system. In such a system, we use two cameras to estimate the disparity and the depth. From there, it is possible to estimate the distance to a vehicle.
You can visualize the final result of this project on KITTI dataset below:
But what is the disparity?
Depth and Disparity Estimation
Simply put, a point or an object can be captured by two cameras mounted in a stereo fashion, but this point will have different coordinates. The disparity is the distance between these two sets of coordinates. In other words, the disparity measures the displacement of points between two images. If we compute the disparity for each pixel on the image, the output will be the disparity map.
Computing the disparity map is essential because it allows us to extract the depth of the image, which can be helpful to know how far is a vehicle or for other applications such as 3D reconstruction. If you are interested in 3D, I also wrote about 3D perception here and 3D deep learning here.
Once the disparity has been estimated, it is possible to estimate the depth. Once we have the depth, it becomes straightforward to calculate the distance. With an object detector, such as YOLO, we can detect the objects of interest and estimate their distance. But how does it work? Below is the diagram of the stereo camera model.
Source: Daviddengcn, CC BY-SA 3.0, via Wikimedia Commons
The depth Z is the distance between a point P in the real world and the camera. This diagram presents a stereo vision system with two parallel cameras, C and C’. The distance B between the cameras is called the baseline, f is the focal length, and x and x’ are the image planes of the cameras C and C’.
By triangulation, we can compute the depth Z with the following formula, where (x - x’) is the disparity:
From the equation above, it is essential to note that depth and disparity have an inverse relationship with one another. In other words, the greater the depth, the lesser the disparity, and the lesser the depth, the greater the disparity.
Now, we know how to get the depth granted that we know the disparity. It is a crucial step as it opens the doors to 3D reconstruction and 3D computer vision in general. But wait! How do we compute the disparity?
Epipolar Geometry and Disparity Estimation
First, let’s look at a diagram of a stereo camera model where two cameras look at the same point X.
Source: Arne Nordmann (norro), CC BY-SA 3.0, via Wikimedia Commons
What is an Epipolar Line
The line Ol - X represents the point X seen by the left camera, which is directly aligned to its optical center OL. On the right camera, this line materialized by the line (eR - XR) is called the epipolar line. Similarly, the line OR - X represents a point for the right camera, but for the left camera, this is materialized by the epipolar line eL-XL. The goal is to find the corresponding point on the right image plane so that we can draw the line that will intersect with X and compute the disparity.
What is an Epipolar Plane
To generalize the previous explanation, the plane X, OL, OR shown as a green triangle on the previous diagram is called the epipolar plane.
Observation on the Epipolar Constraint
If the relative position of the two cameras is known, it is possible to test whether two points correspond to the same 3D points because of the epipolar constraint. The epipolar constraint means that the projection of X on the right camera plane xR must be contained in the eR–xR epipolar line.
Finally, it is important to note that epipolar constraints can be described algebraically by the fundamental matrix.
Disparity Estimation With Semi Global Block matching
There are different methods to estimate the disparity. One traditional method is semi global block matching. It’s fast to implement but it can lack of accuracy and it is computationally intensive. However, it can be a good starting point, and depending on your use case, it might already solve your problem. In fact, this algorithm is an faster than global block matching and more accurate than local block matching. It is also suitable to be ran on ASIC and FPGA. Because of these improvements, it can be a reliable and real-time algorithm used in robotics and autonomous driving.
The method of semi global block matching is an intensity-based algorithm used to compute the dense disparity from a pair of rectified stereo images. It works by analyzing the similarity between pixels in multiple directions.
To run this algorithm, it is crucial to use stereo rectified images, so the epipolar lines are parallel with the horizontal axis, and match the vertical coordinates of each corresponding pixels.
The image below shows the disparity map computed on KITTI dataset using the semi-global block matching algorithm.
If the results with the semi global block matching algorithm are not good enough for your use case, the next step is to implement a deep learning solution, but it takes more time to develop as it is more complexe. At the moment, it is a very active research area.
Depth Map Estimation From Disparity
As explained above, in order to estimate the depth, we first need to compute the disparity, as we just did. From there, we can output the depth map by implementing the equation above. Below is the depth map I got from the disparity map.
With two cameras, we now have the depth value of each pixel. In other words, we removed the 2D barrier due to the data structure of images and we can work in 3D. In autonomous driving, it means that with two cameras, we are able to estimate the distance between our vehicle and an obstacle, which is essential for localization, motion planning and control of the driverless car.
Closing Thought on Stereo Vision
In this article, we briefly learned that it is possible to estimate the distance with at least two cameras, despite having images in 2D, by estimating the disparity and the depth of the image.
If you are curious, you can watch the final result here, where I go one step further adding an object detector and getting the distance to a vehicle.