Deep learning for 3D perceptioncomputer vision and tactile sensing

  1. Alberto García-García
Dirigée par:
  1. José García Rodríguez Directeur
  2. Sergio Orts Escolano Co-directeur

Université de défendre: Universitat d'Alacant / Universidad de Alicante

Fecha de defensa: 28 octobre 2019

Jury:
  1. José María Cecilia Canales President
  2. Jorge Azorín López Secrétaire
  3. Alexandra Psarrou Rapporteur
Département:
  1. TECNOLOGIA INFORMATICA Y COMPUTACION

Type: Thèses

Teseo: 606094 DIALNET lock_openRUA editor

Résumé

The care of dependent people (for reasons of aging, accidents, disabilities or illnesses) is one of the top priority lines of research for the European countries as stated in the Horizon 2020 goals. In order to minimize the cost and the intrusiveness of the therapies for care and rehabilitation, it is desired that such cares are administered at the patient’s home. The natural solution for this environment is an indoor mobile robotic platform. Such robotic platform for home care needs to solve to a certain extent a set of problems that lie in the intersection of multiple disciplines, e.g., computer vision, machine learning, and robotics. In that crossroads, one of the most notable challenges (and the one we will focus on) is scene understanding: the robot needs to understand the unstructured and dynamic environment in which it navigates and the objects with which it can interact. In this thesis we will focus on three core tasks for full scene understanding: object class recognition, semantic segmentation, and grasp stability prediction. The first one refers to the process of categorizing an object into a set of classes (e.g., chair, bed, or pillow); the second one goes one level beyond object categorization and aims to provide a per-pixel dense labeling of each object in an image; the latter consists on determining if an object which has been grasped by a robotic hand is in a stable configuration or if it will fall. This thesis presents contributions towards solving those three tasks using deep learning as the main tool. All those solutions share one core observation: they all rely on tridimensional data inputs to leverage that additional dimension and its spatial arrangement. The four main contributions of this thesis are: first, we show a set of architectures and data representations for 3D object classification using point clouds; secondly, we carry out an extensive review of the state of the art of semantic segmentation datasets and methods; third, we introduce a novel synthetic and large-scale photorealistic dataset for solving various robotic and vision problems together; at last, we propose a novel method and representation to deal with tactile sensors and learn to predict grasp stability.