Big Earth data: disruptive changes in Earth observation data management and analysis?
Turning Earth observation (EO) data consistently and systematically into valuable global information layers is an ongoing challenge for the EO community. Recently, the term 'big Earth data' emerged to describe massive EO datasets that confronts analysts and their traditional workflows with a range of challenges. We argue that the altered circumstances must be actively intercepted by an evolution of EO to revolutionise their application in various domains. The disruptive element is that analysts and end-users increasingly rely on Web-based workflows. In this contribution we study selected systems and portals, put them in the context of challenges and opportunities and highlight selected shortcomings and possible future developments that we consider relevant for the imminent uptake of big Earth data.
Assessing global Sentinel-2 coverage dynamics and data availability for operational Earth observation (EO) applications using the EO-Compass
Sentinel-2 scenes are increasingly being used in operational Earth observation (EO) applications at regional, continental and global scales, in near-real time applications, and with multi-temporal approaches. On a broader scale, they are therefore one of the most important facilitators of the Digital Earth. However, the data quality and availability are not spatially and temporally homogeneous due to effects related to cloudiness, the position on the Earth or the acquisition plan. The spatio-temporal inhomogeneity of the underlying data may therefore affect any big remote sensing analysis and is important to consider. This study presents an assessment of the metadata for all accessible Sentinel-2 Level-1C scenes acquired in 2017, enabling the spatio-temporal coverage and availability to be quantified, including scene availability and cloudiness. Spatial exploratory analysis of the global, multi-temporal metadata also reveals that higher acquisition frequencies do not necessarily yield more cloud-free scenes and exposes metadata quality issues, e.g. systematically incorrect cloud cover estimation in high, non-vegetated altitudes. The continuously updated datasets and analysis results are accessible as a Web application called EO-Compass. It contributes to a better understanding and selection of Sentinel-2 scenes, and improves the planning and interpretation of remote sensing analyses.
Modelling changing population distributions: an example of the Kenyan Coast, 1979-2009
Large-scale gridded population datasets are usually produced for the year of input census data using a top-down approach and projected backward and forward in time using national growth rates. Such temporal projections do not include any subnational variation in population distribution trends and ignore changes in geographical covariates such as urban land cover changes. Improved predictions of population distribution changes over time require the use of a limited number of covariates that are time-invariant or temporally explicit. Here we make use of recently released multi-temporal high-resolution global settlement layers, historical census data and latest developments in population distribution modelling methods to reconstruct population distribution changes over 30 years across the Kenyan Coast. We explore the methodological challenges associated with the production of gridded population distribution time-series in data-scarce countries and show that trade-offs have to be found between spatial and temporal resolutions when selecting the best modelling approach. Strategies used to fill data gaps may vary according to the local context and the objective of the study. This work will hopefully serve as a benchmark for future developments of population distribution time-series that are increasingly required for population-at-risk estimations and spatial modelling in various fields.
Automatic sub-pixel co-registration of Landsat-8 OLI and Sentinel-2A MSI images using phase correlation and machine learning based mapping
This study investigates misregistration issues between Landsat-8/OLI and Sentinel-2A/MSI at 30 m resolution, and between multi-temporal Sentinel-2A images at 10 m resolution using a phase correlation approach and multiple transformation functions. Co-registration of 45 Landsat-8 to Sentinel-2A pairs and 37 Sentinel-2A to Sentinel-2A pairs were analyzed. Phase correlation proved to be a robust approach that allowed us to identify hundreds and thousands of control points on images acquired more than 100 days apart. Overall, misregistration of up to 1.6 pixels at 30 m resolution between Landsat-8 and Sentinel-2A images, and 1.2 pixels and 2.8 pixels at 10 m resolution between multi-temporal Sentinel-2A images from the same and different orbits, respectively, were observed. The non-linear Random Forest regression used for constructing the mapping function showed best results in terms of root mean square error (RMSE), yielding an average RMSE error of 0.07±0.02 pixels at 30 m resolution, and 0.09±0.05 and 0.15±0.06 pixels at 10 m resolution for the same and adjacent Sentinel-2A orbits, respectively, for multiple tiles and multiple conditions. A simpler 1 order polynomial function (affine transformation) yielded RMSE of 0.08±0.02 pixels at 30 m resolution and 0.12±0.06 (same Sentinel-2A orbits) and 0.20±0.09 (adjacent orbits) pixels at 10 m resolution.
Semantic and syntactic interoperability in online processing of big Earth observation data
The challenge of enabling syntactic and semantic interoperability for comprehensive and reproducible online processing of big Earth observation (EO) data is still unsolved. Supporting both types of interoperability is one of the requirements to efficiently extract valuable information from the large amount of available multi-temporal gridded data sets. The proposed system wraps world models, (semantic interoperability) into OGC Web Processing Services (syntactic interoperability) for semantic online analyses. World models describe spatio-temporal entities and their relationships in a formal way. The proposed system serves as enabler for (1) technical interoperability using a standardised interface to be used by all types of clients and (2) allowing experts from different domains to develop complex analyses together as collaborative effort. Users are connecting the world models online to the data, which are maintained in a centralised storage as 3D spatio-temporal data cubes. It allows also non-experts to extract valuable information from EO data because data management, low-level interactions or specific software issues can be ignored. We discuss the concept of the proposed system, provide a technical implementation example and describe three use cases for extracting changes from EO images and demonstrate the usability also for non-EO, gridded, multi-temporal data sets (CORINE land cover).
Spatial and temporal intercomparison of four global burned area products
We characterize the agreement and disagreement of four publically available burned products (Fire CCI, Copernicus Burnt Area, MODIS MCD45A1, and MODIS MCD64A1) at a finer spatial and temporal scale than previous assessments using a grid of three-dimensional cells defined both in space and in time. Our analysis, conducted using seven years of data (2005-2011), shows that estimates of burned area vary greatly between products in terms of total area burned, the location of burning, and the timing of the burning. We use regional and monthly units for analysis to provide insight into the variation between products that can be lost when considering products yearly and/or globally. Comparison with independent, contemporaneous MODIS active fire observations provides one indication of which products most reasonably capture the burning regime. Our results have implications for the use of global burned area products in fire ecology, management and emissions applications.
Exposing the urban continuum: Implications and cross-comparison from an interdisciplinary perspective
There is an increasing availability of geospatial data describing patterns of human settlement and population such as various global remote-sensing based built-up land layers, fine-grained census-based population estimates, and publicly available cadastral and building footprint data. This development constitutes new integrative modelling opportunities to characterize the continuum of urban, peri-urban, and rural settlements and populations. However, little research has been done regarding the agreement between such data products in measuring human presence which is measured by different proxy variables (i.e., presence of built-up structures derived from different remote sensors, census-derived population counts, or cadastral land parcels). In this work, we quantitatively evaluate and cross-compare the ability of such data to model the urban continuum, using a unique, integrated validation database of cadastral and building footprint data, U.S. census data, and three different versions of the Global Human Settlement Layer (GHSL) derived from remotely sensed data. We identify advantages and shortcomings of these data types across different geographic settings in the U.S., which will inform future data users on implications of data accuracy and suitability for a given application, even in data-poor regions of the world.
Using geospatial social media data for infectious disease studies: a systematic review
Geospatial social media (GSM) data has been increasingly used in public health due to its rich, timely, and accessible spatial information, particularly in infectious disease research. This review synthesized 86 research articles that use GSM data in infectious diseases published between December 2013 and March 2022. These articles cover 12 infectious disease types ranging from respiratory infectious diseases to sexually transmitted diseases with spatial levels varying from the neighborhood, county, state, and country. We categorized these studies into three major infectious disease research domains: surveillance, explanation, and prediction. With the assistance of advanced statistical and spatial methods, GSM data has been widely and deeply applied to these domains, particularly in surveillance and explanation domains. We further identified four knowledge gaps in terms of contextual information use, application scopes, spatiotemporal dimension, and data limitations and proposed innovation opportunities for future research. Our findings will contribute to a better understanding of using GSM data in infectious diseases studies and provide insights into strategies for using GSM data more effectively in future research.