Sentinel2GlobalLULC: A dataset of Sentinel-2 georeferenced RGB imagery annotated for global land use/land cover mapping with deep learning (License CC BY 4.0)

  1. Benhammou, Yassir 1
  2. Alcaraz-Segura, Domingo 2
  3. Guirado, Emilio 3
  4. Khaldi, Rohaifa 4
  5. Tabik, Siham 5
  1. 1 Department of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence, DaSCI, University of Granada, 18071, Granada, Spain. Systems Analysis and Modeling for Decision Support Laboratory, National School of Applied Sciences of Berrechid, Hassan 1st University, Berrechid 218, Morocco
  2. 2 Department of Botany, Faculty of Science, University of Granada, 18071 Granada, Spain. iEcolab, Inter-University Institute for Earth System Research, University of Granada, 18006 Granada, Spain
  3. 3 Andalusian Center for Assessment and Monitoring of Global Change (CAESCG), University of Almería, 04120 Almería, Spain
  4. 4 Department of Botany, Faculty of Science, University of Granada, 18071 Granada, Spain. ENSIAS, Mohammed V University, Rabat, 10170, Morocco
  5. 5 Department of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence, DaSCI, University of Granada, 18071, Granada, Spain

Éditeur: Zenodo

Année de publication: 2022

Type: Dataset

CC BY 4.0

Résumé

Sentinel2GlobalLULC is a deep learning-ready dataset of RGB images from the Sentinel-2 satellites designed for global land use and land cover (LULC) mapping. Sentinel2GlobalLULC v2.1 contains 194,877 images in GeoTiff and JPEG format corresponding to 29 broad LULC classes. Each image has 224 x 224 pixels at 10 m spatial resolution and was produced by assigning the 25th percentile of all available observations in the Sentinel-2 collection between June 2015 and October 2020 in order to remove atmospheric effects (i.e., clouds, aerosols, shadows, snow, etc.). A spatial purity value was assigned to each image based on the consensus across 15 different global LULC products available in Google Earth Engine (GEE). Our dataset is structured into 3 main zip-compressed folders, an Excel file with a dictionary for class names and descriptive statistics per LULC class, and a python script to convert RGB GeoTiff images into JPEG format. The first folder called "Sentinel2LULC_GeoTiff.zip" contains 29 zip-compressed subfolders where each one corresponds to a specific LULC class with hundreds to thousands of GeoTiff Sentinel-2 RGB images. The second folder called "Sentinel2LULC_JPEG.zip" contains 29 zip-compressed subfolders with a JPEG formatted version of the same images provided in the first main folder. The third folder called "Sentinel2LULC_CSV.zip" includes 29 zip-compressed CSV files with as many rows as provided images and with 12 columns containing the following metadata (this same metadata is provided in the image filenames): Land Cover Class ID: is the identification number of each LULC class Land Cover Class Short Name: is the short name of each LULC class Image ID: is the identification number of each image within its corresponding LULC class Pixel purity Value: is the spatial purity of each pixel for its corresponding LULC class calculated as the spatial consensus across up to 15 land-cover products GHM Value: is the spatial average of the Global Human Modification index (gHM) for each image Latitude: is the latitude of the center point of each image Longitude: is the longitude of the center point of each image Country Code: is the Alpha-2 country code of each image as described in the ISO 3166 international standard. To understand the country codes, we recommend the user to visit the following website where they present the Alpha-2 code for each country as described in the ISO 3166 international standard:https: //www.iban.com/country-codes Administrative Department Level1: is the administrative level 1 name to which each image belongs Administrative Department Level2: is the administrative level 2 name to which each image belongs Locality: is the name of the locality to which each image belongs Number of S2 images : is the number of found instances in the corresponding Sentinel-2 image collection between June 2015 and October 2020, when compositing and exporting its corresponding image tile For seven LULC classes, we could not export from GEE all images that fulfilled a spatial purity of 100% since there were millions of them. In this case, we exported a stratified random sample of 14,000 images and provided an additional CSV file with the images actually contained in our dataset. That is, for these seven LULC classes, we provide these 2 CSV files: A CSV file that contains all exported images for this class A CSV file that contains all images available for this class at spatial purity of 100%, both the ones exported and the ones not exported, in case the user wants to export them. These CSV filenames end with "including_non_downloaded_images". To clearly state the geographical coverage of images available in this dataset, we included in the version v2.1, a compressed folder called "Geographic_Representativeness.zip". This zip-compressed folder contains a csv file for each LULC class that provides the complete list of countries represented in that class. Each csv file has two columns, the first one gives the country code and the second one gives the number of images provided in that country for that LULC class. In addition to these 29 csv files, we provided another csv file that maps each ISO Alpha-2 country code to its original full country name. © Sentinel2GlobalLULC Dataset by Yassir Benhammou, Domingo Alcaraz-Segura, Emilio Guirado, Rohaifa Khaldi, Boujemâa Achchab, Francisco Herrera & Siham Tabik is marked with Attribution 4.0 International (CC-BY 4.0)