Monocular Depth Estimation: Datasets, Methods, and Applications

Bauer, Zuria

Monocular Depth EstimationDatasets, Methods, and Applications

Bauer, Zuria

unter der Leitung von:

Miguel Cazorla Quevedo Doktorvater
Sergio Orts Escolano Doktorvater

Universität der Verteidigung: Universitat d'Alacant / Universidad de Alicante

Fecha de defensa: 15 von September von 2021

Gericht:

Ester Martínez Martín Präsidentin
Francisco Gómez Donoso Sekretär
José Carlos Rangel Ortiz Vocal

Fachbereiche:

CIENCIA DE LA COMPUTACION E INTELIGENCIA ARTIFICIAL

Art: Dissertation

Teseo: 675503 DIALNET RUA editor

Zusammenfassung

The World Health Organization (WHO) stated in February 2021 at the Seventy- Third World Health Assembly that, globally, at least 2.2 billion people have a near or distance vision impairment. They also denoted the severe impact vision impairment has on the quality of life of the individual suffering from this condition, how it affects the social well-being and their economic independence in society, becoming in some cases an additional burden to also people in their immediate surroundings. In order to minimize the costs and intrusiveness of the applications and maximize the autonomy of the individual life, the natural solution is using systems that rely on computer vision algorithms. The systems improving the quality of life of the visually impaired need to solve different problems such as: localization, path recognition, obstacle detection, environment description, navigation, etc. Each of these topics involves an additional set of problems that have to be solved to address it. For example, for the task of object detection, there is the need of depth prediction to know the distance to the object, path recognition to know if the user is on the road or on a pedestrian path, alarm system to provide notifications of danger for the user, trajectory prediction of the approaching obstacle, and those are only the main key points. Taking a closer look at all of these topics, they have one key component in common: depth estimation/prediction. All of these topics are in need of a correct estimation of the depth in the scenario. In this thesis, our main focus relies on addressing depth estimation in indoor and outdoor environments. Traditional depth estimation methods, like structure from motion and stereo matching, are built on feature correspondences from multiple viewpoints. Despite the effectiveness of these approaches, they need a specific type of data for their proper performance. Since our main goal is to provide systems with minimal costs and intrusiveness that are also easy to handle we decided to infer the depth from single images: monocular depth estimation. Estimating depth of a scene from a single image is a simple task for humans, but it is notoriously more difficult for computational models to be able to achieve high accuracy and low resource requirements. Monocular Depth Estimation is this very task of estimating depth from a single RGB image. Since there is only a need of one image, this approach is used in applications such as autonomous driving, scene understanding or 3D modeling where other type of information is not available. This thesis presents contributions towards solving this task using deep learning as the main tool. The four main contributions of this thesis are: first, we carry out an extensive review of the state-of-the-art in monocular depth estimation; secondly, we introduce a novel large scale high resolution outdoor stereo dataset able to provide enough image information to solve various common computer vision problems; thirdly, we show a set of architectures able to predict monocular depth effectively; and, at last, we propose two real life applications of those architectures, addressing the topic of enhancing the perception for the visually impaired using low-cost wearable sensors.