Navigate back to the homepage

Disparity and Depth Estimation From Stereo Camera

Xavier Rigoulet
May 23rd, 2022 · 4 min read

Can we estimate the distance to a vehicle with cameras only?

In fact, it is possible with a stereo camera system. In such a system, we use two cameras to estimate the disparity and the depth. From there, it is possible to estimate the distance to a vehicle.

You can visualize the final result of this project on KITTI dataset below:

object detection and distance estimation with two cameras

But what is the disparity?

Depth and Disparity Estimation

Simply put, a point or an object can be captured by two cameras mounted in a stereo fashion, but this point will have different coordinates. The disparity is the distance between these two sets of coordinates. In other words, the disparity measures the displacement of points between two images. If we compute the disparity for each pixel on the image, the output will be the disparity map.

disparity map

Computing the disparity map is essential because it allows us to extract the depth of the image, which can be helpful to know how far is a vehicle or for other applications such as 3D reconstruction. If you are interested in 3D, I also wrote about 3D perception here and 3D deep learning here.

Once the disparity has been estimated, it is possible to estimate the depth. Once we have the depth, it becomes straightforward to calculate the distance. With an object detector, such as YOLO, we can detect the objects of interest and estimate their distance. But how does it work? Below is the diagram of the stereo camera model.

diagram of stereo camera model
Source: Daviddengcn, CC BY-SA 3.0, via Wikimedia Commons

The depth Z is the distance between a point P in the real world and the camera. This diagram presents a stereo vision system with two parallel cameras, C and C’. The distance B between the cameras is called the baseline, f is the focal length, and x and x’ are the image planes of the cameras C and C’.

By triangulation, we can compute the depth Z with the following formula, where (x - x’) is the disparity:

equation of depth from the disparity

From the equation above, it is essential to note that depth and disparity have an inverse relationship with one another. In other words, the greater the depth, the lesser the disparity, and the lesser the depth, the greater the disparity.

Now, we know how to get the depth granted that we know the disparity. It is a crucial step as it opens the doors to 3D reconstruction and 3D computer vision in general. But wait! How do we compute the disparity?

Epipolar Geometry and Disparity Estimation

First, let’s look at a diagram of a stereo camera model where two cameras look at the same point X.

Epipolar Geometry
Source: Arne Nordmann (norro), CC BY-SA 3.0, via Wikimedia Commons

What is an Epipolar Line

The line Ol - X represents the point X seen by the left camera, which is directly aligned to its optical center OL. On the right camera, this line materialized by the line (eR - XR) is called the epipolar line. Similarly, the line OR - X represents a point for the right camera, but for the left camera, this is materialized by the epipolar line eL-XL. The goal is to find the corresponding point on the right image plane so that we can draw the line that will intersect with X and compute the disparity.

What is an Epipolar Plane

To generalize the previous explanation, the plane X, OL, OR shown as a green triangle on the previous diagram is called the epipolar plane.

Observation on the Epipolar Constraint

If the relative position of the two cameras is known, it is possible to test whether two points correspond to the same 3D points because of the epipolar constraint. The epipolar constraint means that the projection of X on the right camera plane xR must be contained in the eR–xR epipolar line.

Finally, it is important to note that epipolar constraints can be described algebraically by the fundamental matrix.

Disparity Estimation With Semi Global Block matching

There are different methods to estimate the disparity. One traditional method is semi global block matching. It’s fast to implement but it can lack of accuracy and it is computationally intensive. However, it can be a good starting point, and depending on your use case, it might already solve your problem. In fact, this algorithm is an faster than global block matching and more accurate than local block matching. It is also suitable to be ran on ASIC and FPGA. Because of these improvements, it can be a reliable and real-time algorithm used in robotics and autonomous driving.

The method of semi global block matching is an intensity-based algorithm used to compute the dense disparity from a pair of rectified stereo images. It works by analyzing the similarity between pixels in multiple directions.

To run this algorithm, it is crucial to use stereo rectified images, so the epipolar lines are parallel with the horizontal axis, and match the vertical coordinates of each corresponding pixels.

The image below shows the disparity map computed on KITTI dataset using the semi-global block matching algorithm.

disparity map from stereo semi-global block matching

If the results with the semi global block matching algorithm are not good enough for your use case, the next step is to implement a deep learning solution, but it takes more time to develop as it is more complexe. At the moment, it is a very active research area.

Depth Map Estimation From Disparity

As explained above, in order to estimate the depth, we first need to compute the disparity, as we just did. From there, we can output the depth map by implementing the equation above. Below is the depth map I got from the disparity map.

depth map from stereo camera with StereoSGBM

With two cameras, we now have the depth value of each pixel. In other words, we removed the 2D barrier due to the data structure of images and we can work in 3D. In autonomous driving, it means that with two cameras, we are able to estimate the distance between our vehicle and an obstacle, which is essential for localization, motion planning and control of the driverless car.

Closing Thought on Stereo Vision

In this article, we briefly learned that it is possible to estimate the distance with at least two cameras, despite having images in 2D, by estimating the disparity and the depth of the image.

If you are curious, you can watch the final result here, where I go one step further adding an object detector and getting the distance to a vehicle.

If you enjoyed reading please share...

Join our email list and get notified about new content

Be the first to receive our latest content with the ability to opt-out at anytime. We promise to not spam your inbox or share your email with any third parties.

More articles from Digital Nuage

LiDAR 3D Perception and Object Detection

Discover how to process LiDAR point clouds for object detection

March 10th, 2022 · 6 min read

PointNet or The First Neural Network to Handle Directly 3D Point Clouds

PointNet was the first deep learning model to process point cloud data.

February 28th, 2022 · 7 min read
© 2021–2022 Digital Nuage
Link to $