Synthetic Data Generation for 6D Object Pose and Grasping Estimation

Martínez González, Pablo

Synthetic Data Generation for 6D Object Pose and Grasping Estimation

Martínez González, Pablo

Supervised by:

José García Rodríguez Director
Sergio Orts Escolano Director

Defence university: Universitat d'Alacant / Universidad de Alicante

Fecha de defensa: 16 March 2023

Committee:

José María Cecilia Canales Chair
Jorge Azorín López Secretary
Alexandra Psarrou Secretary

Department:

TECNOLOGIA INFORMATICA Y COMPUTACION

Type: Thesis

Teseo: 797416 DIALNET RUA editor

Abstract

Teaching a robot how to behave so it becomes completely autonomous is not a simple task. When robotic systems become truly intelligent, interactions with them will feel natural and easy, but nothing could be further from truth. Make a robot understand its surroundings is a huge task that the computer vision field tries to address, and deep learning techniques are bringing us closer. But at the cost of the data. Synthetic data generation is the process of generating artificial data that is used to train machine learning models. This data is generated using computer algorithms and simulations, and is designed to resemble real-world data as closely as possible. The use of synthetic data has become increasingly popular in recent years, partic- ularly in the field of deep learning, due to the shortage of high-quality annotated real-world data and the high cost of collecting it. For that reason, in this thesis we are addressing the task of facilitating the generation of synthetic data with the creation of a framework which leverages advances in modern rendering engines. In this context, the generated synthetic data can be used to train models for tasks such as 6D object pose estimation or grasp estimation. 6D object pose estimation refers to the problem of determining the position and orientation of an object in 3D space, while grasp estimation involves predicting the position and orientation of a robotic hand or gripper that can be used to pick up and manipulate the object. These are important tasks in robotics and computer vision, as they enable robots to perform complex manipulation and grasping tasks. In this work we propose a way of extracting grasping information from hand-object interactions in virtual reality, so that synthetic data can also boost research in that area. Finally, we use this syn- thetically generated data to test the proposal of applying 6D object pose estimation architectures to grasping region estimation. This idea is based on both problems sharing several underlying concepts such as object detection and orientation.