Estimating depth from 2D images is often used in scene reconstruction, 3D object recognition, segmentation, and detection. Based on a single RGB image as input, we are predicting a depth map for the entire scene.
Encoder Takes an input image and generates a high-dimensional feature vector Decoder Takes a high-dimensional feature vector and generates a semantic segmentation mask There are 3 major building blocks: Convolution Down-Sampling Up-Sampling
In this example, we will show how can we blur part of the background and emphasize the foreground. Starting with the original image:
We are calculating depth map and converting it into grayscale:
Based on the given threshold we extract the foreground and inverted mask
This helps us to make foreground image with transparent background:
When we concatenate this image with blured version of original image:
We get the end result: