An ever-increasing volume of Earth data is being gathered. These data are “big” not only in size but also in their complexity, different formats, and varied scientific disciplines. As such, big data are disrupting traditional research. New methods and platforms, such as the cloud, are tackling these new challenges.
Big Data Analytics in Earth, Atmospheric, and Ocean Sciences explores new tools for the analysis and display of the rapidly increasing volume of data about the Earth.
The rapid growth of remote sensing big data (RSBD) has attracted considerable attention from both academia and industry. Despite the progress of computer technologies, conventional computing implementations have become technically inefficient for processing RSBD. Cloud computing is effective in activating and mining large-scale heterogeneous data and has been widely applied to RSBD over the past years. This study performs a technical review of cloud-based RSBD storage and computing from an interdisciplinary viewpoint of remote sensing and computer science. First, we elaborate on four critical technical challenges resulting from the scale expansion of RSBD applications, i.e. raster storage, metadata management, data homogeneity, and computing paradigms. Second, we introduce state-of-the-art cloud-based data management technologies for RSBD storage. The unit for manipulating remote sensing data has evolved due to the scale expansion and use of novel technologies, which we name the RSBD data model. Four data models are suggested, i.e. scenes, ARD, data cubes, and composite layers. Third, we summarize recent research on the application of various cloud-based parallel computing technologies to RSBD computing implementations. Finally, we categorize the architectures of mainstream RSBD platforms. This research provides a comprehensive review of the fundamental issues of RSBD for computing experts and remote sensing researchers.
Nearly a petabyte of NASA’s Physical Oceanography Distributed Active Archive Center (PO.DAAC) data products have been moved to NASA’s Earthdata Cloud—hosted in the Amazon Web Services (AWS) cloud. To maximize the full potential of cloud computing on Big Data, one needs to be familiar with not only the data products and their access methods, but also a new set of knowledge that comes with working in a cloud environment. This can be a daunting task for the majority of the science community, who may be familiar with high-performance computing, but not with AWS services. To aid end users in learning and to be successful during this paradigm shift, the PO.DAAC team has been exploring pathways toward practical solutions for research groups beyond data access and into data analysis in the cloud.
During this webinar, we will share some preliminary findings of this PO.DAAC work. We will assume participants have zero knowledge of AWS services and the Earthdata Cloud, and present a step-by-step walkthrough of exploring and discovering PO.DAAC data hosted in the Earthdata Cloud and applying AWS cloud computing to analyze global sea level rise from altimetry data and Estimating the Circulation and Climate of the Ocean (ECCO) products.
Often combined with other traditional and non-traditional types of data, geospatial sensing data have a crucial role in public health studies. We conducted a systematic narrative review to broaden our understanding of the usage of big geospatial sensing, ancillary data, and related spatial data infrastructures in public health studies. Methods: English-written, original research articles published during the last ten years were examined using three leading bibliographic databases (i.e., PubMed, Scopus, and Web of Science) in April 2022. Study quality was assessed by following well-established practices in the literature. Results: A total of thirty-two articles were identified through the literature search. We observed the included studies used various data-driven approaches to make better use of geospatial big data focusing on a range of health and health-related topics. We found the terms ‘big’ geospatial data and geospatial ‘big data’ have been inconsistently used in the existing geospatial sensing studies focusing on public health. We also learned that the existing research made good use of spatial data infrastructures (SDIs) for geospatial sensing data but did not fully use health SDIs for research. Conclusions: This study reiterates the importance of interdisciplinary collaboration as a prerequisite to fully taking advantage of geospatial big data for future public health studies.
Recent advances in cloud-based remote sensing platforms have revoluted the routines for remote sensing big data (RSBD) analysis. However, it is challenging to make user-defined algorithms reusable for RSBD applications if not pre-implemented in RSBD platforms, especially legacy algorithms written with specific programming languages and libraries. In recent years, the emergence of containerization, which is the core feature of cloud native, provided effective solutions to port user-defined algorithms to the cloud environment. In this research, we present a novel approach to deploy user-defined remote sensing algorithms for large-scale analysis based on Data Cube and cloud-native containerization. A processing model is introduced to organize workflows of remote sensing analysis based on Data Cube. The workflows can be decomposed into multiple independent steps and parallelizable tasks following the homogeneity of Data Cube and the parallelizability of remote sensing analysis. Subsequently, the Composite Container is designed to process tasks with user-defined algorithms as built-in algorithms. Then, we introduce Data Cube Resilient Distributed Dataset (DRDD) to implement workflows with Composite Containers following the MapReduce paradigms. The proposed approach was implemented with Science Earth Platform and validated with two sets of up to 10-m resolution continental-scale land cover mapping. Experiment results show that the proposed approach can effectively implement remote sensing analysis with user-defined algorithms and show good performance for continental-scale analysis.
Geospatial data and related technologies have become an increasingly important aspect of data analysis processes, with their prominent role in most of them. Serverless paradigm have become the most popular and frequently used technology within cloud computing. This paper reviews the serverless paradigm and examines how it could be leveraged for geospatial data processes by using open standards in the geospatial community. We propose a system design and architecture to handle complex geospatial data processing jobs with minimum human intervention and resource consumption using serverless technologies. In order to define and execute workflows in the system, we also propose new models for both workflow and task definitions models. Moreover, the proposed system has new Open Geospatial Consortium (OGC) Application Programming Interface (API) Processes specification-based web services to provide interoperability with other geospatial applications with the anticipation that it will be more commonly used in the future. We implemented the proposed system on one of the public cloud providers as a proof of concept and evaluated it with sample geospatial workflows and cloud architecture best practices.
To leverage our past investments in ocean observations and modeling, and to fully exploit new observations, we must transform our infrastructure and tools for working with ocean data. Currently, data intensive ocean research is only accessible to privileged institutions with the resources for high performance computing and data storage. OpenOceanCloud will break down this barrier, providing a research platform to the thousands of potential oceanographers who lack such resources. Access to vast data sets and powerful computing environments can help remove the barriers related to low-bandwidth internet, intermittent power, and limited cyber infrastructure. With this infrastructure, anyone can do science, anywhere, and this empowers communities that have been historically excluded from full participation in oceanography.
Digital Twins of the environment can help reaching sustainability goals and tackling climate change related issues. They will strongly rely on geospatial data, and the processing and analytics thereof. Cloud environments provides the flexibility and scalability needed to cope with the potential enormous geospatial datasets. This article explores the Azure cloud capabilities, and places them in a broader multi cloud perspective.