This book presents cutting-edge research on various ways to bridge the semantic gap in image and video analysis. The respective chapters address different stages of image processing, revealing that the first step is a future extraction, the second is a segmentation process, the third is object recognition, and the fourth and last involve the semantic interpretation of the image. The semantic gap is a challenging area of research, and describes the difference between low-level features extracted from the image and the high-level semantic meanings that people can derive from the image. The result greatly depends on lower level vision techniques, such as feature selection, segmentation, object recognition, and so on. The use of deep models has freed humans from manually selecting and extracting the set of features. Deep learning does this automatically, developing more abstract features at the successive levels.