Sentinel2GlobalLULC: A dataset of Sentinel-2 georeferenced RGB imagery acquired between June 2015 and October 2020 annotated for global land use/land cover mapping with deep learning (License CC BY 4.0)
- Benhammou, Yassir 1
- Alcaraz-Segura, Domingo 2
- Guirado, Emilio 3
- Khaldi, Rohaifa 4
- Tabik, Siham 5
- 1 Department of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence, DaSCI, University of Granada, 18071, Granada, Spain. Systems Analysis and Modeling for Decision Support Laboratory, National School of Applied Sciences of Berrechid, Hassan 1st University, Berrechid 218, Morocco
- 2 Department of Botany, Faculty of Science, University of Granada, 18071 Granada, Spain. iEcolab, Inter-University Institute for Earth System Research, University of Granada, 18006 Granada, Spain
- 3 Andalusian Center for Assessment and Monitoring of Global Change (CAESCG), University of Almería, 04120 Almería, Spain
- 4 Department of Botany, Faculty of Science, University of Granada, 18071 Granada, Spain. ENSIAS, Mohammed V University, Rabat, 10170, Morocco
- 5 Department of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence, DaSCI, University of Granada, 18071, Granada, Spain
Editor: Zenodo
Año de publicación: 2022
Tipo: Dataset
Resumen
Sentinel2GlobalLULC is a deep learning-ready dataset of RGB images from the Sentinel-2 satellites designed for global land use and land cover (LULC) mapping. Sentinel2GlobalLULC v2.0 contains 194,877 images in GeoTiff and JPEG format corresponding to 29 broad LULC classes. Each image has 224 x 224 pixels at 10 m spatial resolution and was produced by assigning the 25th percentile of all available observations in the Sentinel-2 collection between June 2015 and October 2020 in order to remove atmospheric effects (i.e., clouds, aerosols, shadows, snow, etc.). A spatial purity value was assigned to each image based on the consensus across 15 different global LULC products available in Google Earth Engine (GEE). Our dataset is structured into 3 main zip-compressed folders, an Excel file with a dictionary for class names and descriptive statistics per LULC class, and a python script to convert RGB GeoTiff images into JPEG format. The first folder contains 29 zip-compressed subfolders where each one corresponds to a specific LULC class with hundreds to thousands of GeoTiff Sentinel-2 RGB images. The second folder contains 29 zip-compressed subfolders with a JPEG formatted version of the same images provided in the first main folder. The third folder includes 29 zip-compressed CSV files with as many rows as images and with 12 columns containing the following metadata (this same metadata is provided in the image filenames): Land Cover Class ID: is the identification number of each LULC class Land Cover Class Short Name: is the short name of each LULC class Pixel purity Value: is the spatial purity of each pixel for its corresponding LULC class calculated as the spatial consensus across up to 15 land-cover products Image ID: is the identification number of each image within its corresponding LULC class GHM Value: is the spatial average of the Global Human Modification index (gHM) for each image Latitude: is the latitude of the center point of each image Longitude: is the longitude of the center point of each image Country Code: is the Alpha-2 country code of each image as described in the ISO 3166 international standard. To understand the country codes, we recommend the user to visit the following website where they present the Alpha-2 code for each country as described in the ISO 3166 international standard:https: //www.iban.com/country-codes Administrative Department Level1: is the administrative level 1 name to which each image belongs Administrative Department Level2: is the administrative level 2 name to which each image belongs Locality: is the name of the locality to which each image belongs Number of S2 images : is the number of found instances in the corresponding Sentinel-2 image collection between June 2015 and October 2020, when aggregated and exported as a final image For seven LULC classes, we could not export from GEE all images that fulfilled a spatial purity of 100% since there were millions of them. In this case, we exported a stratified random sample of 14,000 images and provided an additional CSV file with the images actually contained in our dataset. That is, for these seven LULC classes, we provide these 2 CSV files: A CSV file that contains all exported images for this class A CSV file that contains all images available for this class at spatial purity of 100%, both the ones exported and the ones not exported, in case the user wants to export them. These CSV filenames end with "including_non_downloaded_images". © Sentinel2GlobalLULC Dataset by Yassir Benhammou, Domingo Alcaraz-Segura, Emilio Guirado, Rohaifa Khaldi, Boujemâa Achchab, Francisco Herrera & Siham Tabik is marked with Attribution 4.0 International (CC-BY 4.0)