An APIfication Approach to Facilitate the Access and Reuse of Open Data

  1. González Mora, César
Dirigida por:
  1. Irene Garrigós Fernández Directora
  2. José J. Zubcoff Vallejo Director

Universidad de defensa: Universitat d'Alacant / Universidad de Alicante

Fecha de defensa: 10 de septiembre de 2021

Tribunal:
  1. Manuel Wimmer Presidente/a
  2. Elena Lloret Pastor Secretaria
  3. Sven Casteleyn Vocal
Departamento:
  1. LENGUAJES Y SISTEMAS INFORMATICOS

Tipo: Tesis

Teseo: 679713 DIALNET lock_openRUA editor

Resumen

Nowadays, there is a tendency to publish data on the Web, due to the benefits it brings to the society and the new legislation that encourage the opening of data. These collections of open data, also known as datasets, are typically published in open data portals by governments and institutions around the world in order to make it open -- available on the Web in a free and reusable manner. The common behaviour tends to be that publishers expose their data as individual tabular datasets. Open data is considered highly valuable because promoting the use of public information produces transparency, innovation and other social, political and economic benefits. Especially, this importance is also considerable in situational scenarios, where a small group of consumers (developers or data scientists) with specific needs require thematic data for a short life cycle. In order that these data consumers easily assess whether the data is adequate for their purpose there are different mechanisms. For example, SPARQL endpoints have become very useful for the consumption of open data, and particularly, Linked Open Data (LOD). Moreover, in order to access open data in a straightforward manner, Web Application Programming Interfaces (APIs) are also highly recommended feature of open data portals. However, accessing this open data is a difficult task since current Open Data platforms do not generally provide suitable strategies to access their data. On the one hand, accessing open data through SPARQL endpoints is a difficult task because it requires knowledge in different technologies, which is challenging especially for novice developers. Moreover, LOD is not usually available since most used formats in open data government portals are tabular. On the other hand, although providing Web APIs would facilitate developers to easily access open data for reusing it from open data portals’ catalogs, there is a lack of suitable Web APIs in open data portals. Moreover, in most cases, the currently available APIs only allow to access catalog’s metadata or to download entire data resources (i.e. coarse-grain access to data), hampering the reuse of data. In addition, as the open data is commonly published individually without considering potential relationships with other datasets, reusing several open datasets together is not a trivial task, thus requiring mechanisms that allow data consumers to integrate and access tabular open data published on the Web. Therefore, open data is not being used to its full potential because it is not easily accessible. As the access to open data is thus still limited for end-users, particularly those without programming skills, we propose a model-based approach to automatically generate Web APIs from open data. This APIfication approach takes into account the access to multiple integrated tabular datasets and the consumption of data in situational scenarios. Firstly, we focus on data that can be integrated by means of join and union operations. Then, we coin the term disposable Web APIs as an alternative mechanism for the consumption of open data in situational scenarios. These disposable Web APIs are created on-the-fly to be used temporarily by a user to consume specific open data. Accordingly, the main objective is to provide suitable mechanisms to easily access and reuse open data on the fly and in an integrated manner, solving the problem of difficult access through SPARQL endpoints for most data consumers and the lack of suitable Web APIs with easy access to open data. With this approach, we address both open data publishers and consumers, as long as the publishers will be able to include a Web API within their data, and data consumers or reusers will be benefited in those cases that a Web API pointing to the open data is missing. The results of the experiments conducted led us to conclude that users consider our generated Web APIs as easy to use, providing the desired open data, even though coming from different datasets and especially in situational scenarios.