Skip to content

Commit 74593ba

Browse files
committed
Update documentation
1 parent fb801b7 commit 74593ba

75 files changed

Lines changed: 311408 additions & 959 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.buildinfo

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Sphinx build info version 1
22
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3-
config: bb0945d3c7fa4c7a6bc2ef05821ab00a
3+
config: ab2a3ecbfd242a2254483b188fe759df
44
tags: 645f666f9bcd5a90fca523b33c5a78b7

_sources/background/context_motivation.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
# Context & Motivation
1+
# 2.1 Context & Motivation
22

33
This book demonstrates scientific workflows using publicly-available, cloud-optimized geospatial datasets and open-source scientific software tools in order to address the need for educational resources related to new technologies and reduce barriers to entry to working with earth observation data. The tutorials in this book focus on the complexities inherent to working with n-dimensional, gridded datasets and use the core stack of software packages built on and around the Xarray data model.
44

5-
### *I. Moving away from the 'download model' of scientific data analysis*
5+
### *Moving away from the 'download model' of scientific data analysis*
66

77
Technological developments in recent decades have engendered fundamental shifts in the nature of scientific data and how it is used for analysis.
88

@@ -11,7 +11,7 @@ Technological developments in recent decades have engendered fundamental shifts
1111
-- {cite}`abernathey_2021_cloud`
1212
```
1313

14-
### *II. Increasingly large, cloud-optimized data means new tools and approaches for data management*
14+
### *Increasingly large, cloud-optimized data means new tools and approaches for data management*
1515

1616
The increase in publicly available earth observation data has transformed scientific workflows across a range of fields, prompting analysts to gain new skills in order to work with larger volumes of data in new formats and locations, and to use distributed cloud-computational resources in their analysis ({cite:t}`abernathey_2021_cloud,gentemann_2021_science,mathieu_2017_esas,ramachandran_2021_open,Sudmanns_2020_big,wagemann_2021_user`).
1717

@@ -21,7 +21,7 @@ The increase in publicly available earth observation data has transformed scient
2121
Volume of NASA Earth Science Data archives, including growth of existing-mission archives and new missions, projected through 2029. Source: [NASA EarthData - Open Science](https://www.earthdata.nasa.gov/about/open-science).
2222
```
2323

24-
### *III. Asking questions of complex datasets*
24+
### *Asking questions of complex datasets*
2525

2626
Scientific workflows involve asking complex questions of diverse types of data. Earth observation and related datasets often contain two types of information: measurements of a physical observable (e.g. temperature) and metadata that provides auxiliary information that required in order to interpret the physical observable (time and location of measurement, information about the sensor, etc.). With the increasingly complex and large volume of earth observation data that is currently available, storing, managing and organizing these types of data can very quickly become a complex and challenging task, especially for students and early-career analysts {cite}`mathieu_esas_2017,palumbo_2017_building,Sudmanns_2020_big,wagemann_2021_user`.
2727

_sources/background/data_cubes.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
# Data cubes
1+
# 2.2 Data cubes
22

33
The term **data cube** is used frequently throughout this book. This page contains an introduction of ***what*** a data cube is and ***why*** it is useful.
44

5-
## *I. Anatomy of a data cube*
5+
## *Anatomy of a data cube*
66

77
The key object of analysis in this book is a [raster data cube](https://openeo.org/documentation/1.0/datacubes.html). Raster data cubes are n-dimensional objects that store continuous measurements or estimates of physical quantities that exist along given dimension(s). Many scientific workflows involve examining how a variable (such as temperature, windspeed, relative humidity, etc.) varies over time and/or space. Data cubes are a way of organizing geospatial data that let us ask these questions.
88

@@ -55,7 +55,7 @@ A data cube should be organized out of these building blocks adhering to the fol
5555
**Attributes** - Metadata that can be assigned to a given `xr.Dataset` or `xr.DataArray` that is ***static*** along that object's dimensions.
5656
:::
5757

58-
## *II. 'Analysis-ready' data*
58+
## *'Analysis-ready' data*
5959
The process described above is an example of preparing data for analysis. Thanks to development and collaboration across the earth observation community, analysis-ready for earth observation has a specific, technical definition:
6060

6161
```{epigraph}
@@ -67,15 +67,15 @@ The development and increasing adoption of analysis-ready specifications for sat
6767

6868
However, many legacy datasets still require significant effort in order to be considered 'analysis-ready'. Furthermore, for analysts, 'analysis-ready' can be a subjective and evolving label. Semantically, from a user-perspective, analysis-ready data can be thought of as data whose structure is conducive to scientific analysis.
6969

70-
## *III. Analysis-ready data cubes & this book*
70+
## *Analysis-ready data cubes & this book*
7171
The tutorials in this book contain examples of data at various degrees of 'analysis-ready'. [Tutorial 1: ITS_LIVE](../itslive/itslive_intro.md) uses a dataset of multi-sensor observations that is already organized as a `(x,y,time)` cube with a common grid. In [Tutorial 2: Sentinel-1](../sentinel1/s1_intro.md), we will see an example of a dataset that has undergone intensive processing to make it 'analysis-ready' but requires further manipulation to arrive at the `(x,y,time)` cube format that will be easist to work with.
7272

7373
### References
7474
- {cite:t}`montero_2024_EarthSystemData`
7575
- {cite:t}`appel_2019_ondemand`
7676
- {cite:t}`giuliani_2019_EarthObservationOpen`
7777
- {cite:t}`truckenbrodt_2019_Sentinel1ARD`
78-
## Additional data cube resources
78+
### Additional data cube resources*
7979
- [OpenEO - Data Cubes](https://openeo.org/documentation/1.0/datacubes.html)
8080
- [Open Data Cube initiative](https://www.opendatacube.org/about-draft)
8181
- [The Datacube Manifesto](http://www.earthserver.eu/tech/datacube-manifesto/The-Datacube-Manifesto.pdf)

_sources/background/relevant_concepts.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Relevant concepts
1+
# 2.6 Relevant concepts
22

33
## *Larger than memory data, parallelization and Dask*
44

_sources/background/software.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# 2.5 Software and computing environment
2+
3+
On this page you'll find information about the computing environment and datasets that will be used in both of the tutorials in this book.
4+
5+
## *Running tutorial materials locally*
6+
7+
There are two options for creating a software environment: [pixi](https://pixi.sh/latest/) or [mamba](https://mamba.readthedocs.io/en/latest/) / [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html). We recommend using pixi to create a consistent environment on different operating systems. If you have pixi installed, follow the steps below, otherwise, follow the steps for conda/mamba below.
8+
9+
### To use pixi
10+
1. Clone the book's GitHub repository:
11+
```git clone https://github.com/e-marshall/cloud-open-source-geospatial-datacube-workflows.git```
12+
13+
2. Navigate into the repo environment:
14+
```cd cloud-open-source-geospatial-datacube-workflows```
15+
16+
3. Execute `pixi run` for each tutorial:
17+
```pixi run itslive```
18+
```pixi run sentinel1```
19+
20+
### To use conda/mamba
21+
22+
1. Clone this book's GitHub repository:
23+
```git clone https://github.com/e-marshall/cloud-open-source-geospatial-datacube-workflows.git```
24+
25+
2. Navigate into the `book` sub-directory:
26+
```cd cloud-open-source-geospatial-datacube-workflows/book```
27+
28+
3. Create and activate a conda environment from the `environment.yml` file located in the repo:
29+
```conda env create -f .binder/environment.yml```
30+
31+
4. Start Jupyterlab and navigate to the directories containing the Jupyter notebooks (`itslive/nbs` and `s1/nbs`):
32+
```jupyterlab```
33+
34+
Both tutorials use functions that are stored in scripts associated with each dataset. You can find these scripts here: [`itslive_tools.py`](../itslive/nbs/itslive_tools.py) and [`s1_tools.py`](../s1/nbs/s1_tools.py).

_sources/background/tutorial_data.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Data used in tutorials
1+
# 2.4 Data used in tutorials
22

33
We use a many different datasts throughout these tutorials. While each tutorial is focused on a different raster time series (ITS_LIVE ice velocity data and Sentinel-1 imagery), we also use vector data to represent points of interest.
44

_sources/background/tutorials_overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Tutorials overview
1+
# 2.3 Tutorials overview
22

33
This book contains two distinct tutorials, each of which focuses on a different cloud-optimized geospatial dataset and different cloud-computing resources. Read more about the datasets used [here](tutorial_data.md).
44

_sources/conclusion/datacubes_revisited.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Data Cubes Revisited
1+
# 5.3 Data Cubes Revisited
22

33
In this book, we saw a range of real-world datasets and the steps required to prepare them for analysis. Several guiding principles for assembling and using analysis-ready data cubes in Xarray can be drawn from these examples.
44

_sources/conclusion/summary.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Tutorials summary
1+
# 5.2 Tutorials summary
22

33
In this book, we worked through tutorials accessing two satellite remote sensing datasets, preparing them for analysis and performing exploratory data analysis and visualization.
44

_sources/conclusion/wrapping_up.md

Lines changed: 15 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,27 @@
1-
# Wrapping up
1+
# 5.1 Wrapping up
22

33
It is a popular refrain, and a sentiment many analysts can likey relate to, "that 80% of data analysis is spent on the cleaning and preparing of data" {cite:t}`Wickham_2014_Tidy,Dasu_2003_Exploratory`. This book focuses on the data cleaning and preparation steps of an analytical workflow that ingests satellite remote sensing time series datasets. We draw on the wealth of knowledge and research that attends to this topic in order to produce tutorials that demonstrate and explain these concepts in the context of cloud-optimized, publicly available array data and the software ecosystem built around the Xarray data model in Python.
44

5-
In this chapter, you will find summaries of the concepts covered throughout the Jupyter Notebooks included in this book, a return to the introduction's [discussion of data cubes](../background/data_cubes.md) with synthesis and lessons learned from the tutorials, and a short discussion of the broader context of this book and next steps for interested readers.
5+
In this chapter, you will find summaries of the concepts covered throughout the Jupyter Notebooks included in this book and a return to the introduction's [discussion of data cubes](../background/data_cubes.md) that synthesizes lessons learned in the tutorials.
66

7-
## Tutorials Summary[$\tiny \nearrow$](summary.md)
8-
9-
This book features two tutorials, each focuses on a different earth observation dataset and containing five notebooks that cover different steps of a typical workflow such as data access, manpiulation and organizatoin and visualization and exploratory analysis. In this section, you will find a few of the common topics throughout these notebooks and links to where they are addressed in each tutorial.
10-
11-
## Data cubes revisited [$\tiny \nearrow$](datacubes_revisited.md)
12-
13-
Synthesizing lessons from tutorial examples to enumerate guidance and best-practices for working Xarray geospatial data cubes.
14-
15-
## Broader context [$\tiny \nearrow$]()
16-
17-
It is a popular refrain, and a sentiment many analysts can likey relate to, "that 80% of data analysis is spent on the cleaning and preparing of data" {cite:t}`Wickham_2014_Tidy,Dasu_2003_Exploratory`. This book focuses on the data cleaning and preparation steps of an analytical workflow. We draw on the wealth of knowledge and research that attends to this topic in order to produce tutorials that demonstrate and explain these topics in the context of satellite remote sensing earth observation datasets and the software ecosystem built around the Xarray data model in Python.
7+
### Broader context [$\tiny \nearrow$]()
8+
189
This book largely focused on the beginning steps of scientific workflows where data is prepared for analysis and manipulated to support different types of analysis.
1910

20-
### Open source tools and packages
21-
In this book, we mainly focused on Xarray and several tools within the Xarray ecosystem and that integrate with Xarray to streamline data cube workflows such as:
22-
- Xvec
23-
- cf_xarray
24-
- Dask
25-
- PySTAC
26-
- stackstac
27-
- GeoPandas
28-
- holoviz
29-
30-
There are many exciting open-source projects and tools related to Xarray data cubes that were not highlighted in this book. A few are:
11+
#### Open source tools and packages
12+
We mainly use Xarray and tools within the Xarray ecosystem. There are many exciting open-source projects and tools related to Xarray data cubes that were not highlighted in this book. A few are:
3113
- [XRLint](https://github.com/bcdev/xrlint)
3214
- [LexCube](https://www.lexcube.org/)
3315
- [xcube](https://xcube.readthedocs.io/en/latest/)
3416
- [cubo](https://github.com/ESDS-Leipzig/cubo)
3517
- [Open Data Cube](https://opendatacube.readthedocs.io/en/latest/index.html)
3618

19+
20+
## 5.2 Tutorials Summary[$\tiny \nearrow$](summary.md)
21+
22+
This book features two tutorials, each focuses on a different earth observation dataset and containing five notebooks that cover different steps of a typical workflow such as data access, manpiulation and organizatoin and visualization and exploratory analysis. In this section, you will find a few of the common topics throughout these notebooks and links to where they are addressed in each tutorial.
23+
24+
## 5.3 Data cubes revisited [$\tiny \nearrow$](datacubes_revisited.md)
25+
26+
Synthesizing lessons from tutorial examples to enumerate guidance and best-practices for working Xarray geospatial data cubes.
27+

0 commit comments

Comments
 (0)