TimeSpec4LULC: A Smart-Global Dataset of Multi-Spectral Time Series of MODIS Terra-Aqua from 2000 to 2021 for Training Machine Learning models to perform LULC Mapping
- Khaldi, Rohaifa 1
- Alcaraz-Segura, Domingo 2
- Guirado, Emilio 3
- Benhammou, Yassir 1
- Tabik, Siham 1
- 1 Dept. of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence, DaSCI, University of Granada, 18071, Granada, Spain
- 2 Dept. of Botany, Faculty of Science, University of Granada, 18071 Granada, Spain. iEcolab, Inter-University Institute for Earth System Research, University of Granada, 18006 Granada, Spain
- 3 Multidisciplinary Institute for Environment Studies "Ramón Margalef", University of Alicante, 03690, Spain
Editor: Zenodo
Year of publication: 2022
Type: Dataset
Abstract
TimeSpec4LULC is a smart open-source global dataset of multi-spectral time series for 29 Land Use and Land Cover (LULC) classes ready to train machine learning models. It was built based on the seven spectral bands of the MODIS sensors at 500 m resolution from 2000 to 2021 (262 observations in each time series). Then, was annotated using spatial-temporal agreement across the 15 global LULC products available in Google Earth Engine (GEE). TimeSpec4LULC contains two datasets: the original dataset distributed over 6,076,531 pixels, and the balanced subset of the original dataset distributed over 29000 pixels. The original dataset contains 30 folders, namely "Metadata", and 29 folders corresponding to the 29 LULC classes. The folder "Metadata" holds 29 different CSV files describing the metadata of the 29 LULC classes. The remaining 29 folders contain the time series data for the 29 LULC classes. Each folder holds 262 CSV files corresponding to the 262 months. Inside each CSV file, we provide the seven values of the spectral bands as well as the coordinates for all the LULC class-related pixels. The balanced subset of the original dataset contains the metadata and the time series data for 1000 pixels per class representative of the globe. It holds 29 different JSON files following the names of the 29 LULC classes. The features of the dataset are: - ".geo": the geometry and coordinates (longitude and latitude) of the pixel center. - "ADM0_Code": the GAUL country code. - "ADM1_Code": the GAUL first-level administrative unit code. - GHM_Index": the average of the global human modification index. - "Products_Agreement_Percentage": the agreement percentage over the 15 global LULC products available in GEE. - "Temporal_Availability_Percentage": the percentage of non-missing values in each band. - "Pixel_TS": the time series values of the seven spectral bands.