Skip to content

Commit 9896259

Browse files
authored
add data download instructions and code (#36)
* add data download instructions and code * update text instructions * add nb * reference + text updates
1 parent b230f3d commit 9896259

16 files changed

Lines changed: 531 additions & 74 deletions

book/_config.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -243,7 +243,7 @@ sphinx:
243243
b_s1_nb5: "B. Ensure direct comparison between datasets"
244244
b1_s1_nb5: "1) Subset time series to common time steps"
245245
b2_s1_nb5: "2) Handle differences in spatial resolution"
246-
b3_s1_nb5: 3) Mask missing data from one dataset"
246+
b3_s1_nb5: "3) Mask missing data from one dataset"
247247

248248
c_s1_nb5: "C. Combine objects"
249249
c1_s1_nb5: "1) `expand_dims()` to add 'source' dimension"

book/background/relevant_concepts.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ Climate Forecase (CF) Metadata Conventions
4444
4545
>The CF metadata conventions are designed to promote the processing and sharing of files created with the NetCDF API. The conventions define metadata that provide a definitive description of what the data in each variable represents, and the spatial and temporal properties of the data. This enables users of data from different sources to decide which quantities are comparable, and facilitates building applications with powerful extraction, regridding, and display capabilities. The CF convention includes a standard name table, which defines strings that identify physical quantities.
4646
47-
CF metadata conventions set common expectations for metadata names and locations across datasets. In this tutorial, we will use tools such as [cf_xarray]() that leverage CF conventions to add programmatic handling of CF metadata to Xarray objects, meaning that users can spend less time wrangling metadata. 🤩
47+
CF metadata conventions set common expectations for metadata names and locations across datasets. In this tutorial, we will use tools such as [cf_xarray]() that leverage CF conventions to add programmatic handling of CF metadata to Xarray objects, meaning that users can spend less time wrangling metadata.
4848
4949
Spatio-temporal Asset Catalog (STAC)
5050
STAC is a metadata specification for geospatial data that allows the data to be more easily "worked with, indexed, and discovered" [$\tiny \nearrow$](https://stacspec.org/en). It does this by setting a common format for how metadata will be structured. This functions like setting a common expectation that all users of the data can rely on so that they know where certain information will be located and how it will be stored.

book/background/tutorial_data.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,16 +13,16 @@ Here is a broad overview the data included in this tutorial, including how it is
1313
| ITS_LIVE | [ITS_LIVE project, NASA JPL](https://its-live.jpl.nasa.gov/) | Zarr | AWS S3|
1414

1515

16-
ITS_LIVE is a dataset of ice velocity observations derived from applying a feature tracking algorithm to pairs of satellite imagery. Ice velocity refers to the downslope movement of glaciers and ice sheets. Because glaciers and ice sheets are dynamic elements of our climate system, they lose or gain mass in response to changes in climate conditions such as warmer temperatures or increased snowfall, measuring variability in the speed of ice flow can help scientists better understand trends in glacier dynamics and interactions between glaciers and climate.
16+
ITS_LIVE is a dataset of ice velocity observations derived from applying a feature tracking algorithm to pairs of satellite imagery. Ice velocity refers to the downslope movement of glaciers and ice sheets {cite}`Gardner_Scambos_2022`. Because glaciers and ice sheets are dynamic elements of our climate system, they lose or gain mass in response to changes in climate conditions such as warmer temperatures or increased snowfall, measuring variability in the speed of ice flow can help scientists better understand trends in glacier dynamics and interactions between glaciers and climate.
1717

1818
Part of what is so exciting about ITS_LIVE is that it combines image pairs from a number of satellites, including imagery from optical (Landsat 4,5,7,8,9 & Sentinel-2) and synthetic aperture radar (Sentinel-1) sensors. For this reason, ITS_LIVE time series data can be quite large. Another exciting aspect of the ITS_LIVE dataset is that the image pair time series data is made available as Zarr data cubes stored in cloud object storage on Amazon Web Services (AWS), meaning that users don't need to download massive files to start working with the data!
1919

2020
ITS_LIVE produces a number of data products in addition to the image pair time series that we use in this tutorial, and provides different options to access the data. Check them out [here](https://its-live.jpl.nasa.gov/#access).
2121

2222
**Documentation & References**:
2323
Be sure to also check out the ITS_LIVE image pair velocities [documentation](http://its-live-data.jpl.nasa.gov.s3.amazonaws.com/documentation/ITS_LIVE-Landsat-Scene-Pair-Velocities-v01.pdf) and papers on the ITS_LIVE processing methodology:
24-
- [Processing methodology for the ITS_LIVE Sentinel-1 ice velocity products](https://doi.org/10.5194/essd-14-5111-2022). Lei et al., (2022)
25-
- [Autonomous Repeat Image Feature Tracking (autoRIFT) and its application for tracking ice displacement](https://www.mdpi.com/2072-4292/13/4/749). Lei et al., (2021)
24+
- [Autonomous Repeat Image Feature Tracking (autoRIFT) and its application for tracking ice displacement](https://www.mdpi.com/2072-4292/13/4/749). {cite}`lei_2021_AutonomousRepeatImage`
25+
- [Processing methodology for the ITS_LIVE Sentinel-1 ice velocity products](https://doi.org/10.5194/essd-14-5111-2022). {cite}`Lei_2022_Processing`
2626

2727
**Further reading on ice velocities**:
2828
- [NASA/USGS Provide Global View of Speed of Ice](https://www.jpl.nasa.gov/news/nasausgs-provide-global-view-of-speed-of-ice/)
@@ -69,7 +69,7 @@ SAR data is collected in slant range, which is the viewing geometry of the side-
6969
| Sentinel-1 RTC | [Alaska Satellite Facility](https://asf.alaska.edu/) | COG (locally as GeoTIFF) | Local |
7070

7171

72-
We use Sentinel-1 RTC imagery processed by Alaska Satellite Facility's Hypbrid Pluggable Processing Pipeline (**HyP3**). This is a processing platform that allows users to perform processing steps necessary for analysis-ready SAR data through ASF.
72+
We use Sentinel-1 RTC imagery processed by Alaska Satellite Facility's Hypbrid Pluggable Processing Pipeline (**HyP3**) {cite}`hogenson_2024_10903242`. This is a processing platform that allows users to perform processing steps necessary for analysis-ready SAR data through ASF.
7373

7474
From the [ASF HyP3 Documentation](https://hyp3-docs.asf.alaska.edu/):
7575
HyP3 is a service for processing Synthetic Aperture Radar (SAR) imagery that addresses many common issues for users of SAR data:
@@ -81,7 +81,7 @@ HyP3 is a service for processing Synthetic Aperture Radar (SAR) imagery that add
8181
HyP3 solves these problems by providing a free service where people can request SAR processing on-demand. These processing requests are picked up by automated systems, which handle the complexity of SAR processing on behalf of the user. HyP3 doesn't require users to have a lot of knowledge of SAR processing before getting started; users only need to submit the input data and set a few optional parameters if desired. With HyP3, analysis-ready products are just a few clicks away.
8282

8383

84-
The data in this tutorial was processed using HyP3 and then published via Zenodo [here](https://zenodo.org/records/7236413#.Y1rNi37MJ-0). For more on how to use HyP3 for your own data processing needs, check out their [tutorials page](https://hyp3-docs.asf.alaska.edu/tutorials/).
84+
The data in this tutorial was processed using HyP3 {cite}`andrew_johnston_2022_6629125` and then published via Zenodo [here](https://zenodo.org/records/7236413#.Y1rNi37MJ-0). For more on how to use HyP3 for your own data processing needs, check out their [tutorials page](https://hyp3-docs.asf.alaska.edu/tutorials/).
8585
:::
8686

8787
:::{tab-item} Microsoft Planetary Computer
@@ -100,7 +100,7 @@ Further reading on SAR data and Sentinel-1:
100100
- [ASF Introduction to SAR](https://hyp3-docs.asf.alaska.edu/guides/introduction_to_sar/)
101101
- [NASA Earth Observation Data Basics - SAR](https://www.earthdata.nasa.gov/learn/earth-observation-data-basics/sar#toc-resources)
102102
- [University of Alaska Fairbanks - Microwave Remote Sensing](https://radar.community.uaf.edu/)
103-
- [Mathematical tutorial on SAR](https://www.earthdata.nasa.gov/s3fs-public/2024-06/sar%20mathematical%20tutorial.pdf) (by Margaret Cheney, from NASA EarthData)
103+
- Mathematical tutorial on SAR {cite}`cheney_SAR_2001`, publicly available via NASA [EarthData](https://www.earthdata.nasa.gov/s3fs-public/2024-06/sar%20mathematical%20tutorial.pdf)
104104

105105
## *Vector data*
106106

@@ -110,7 +110,7 @@ Further reading on SAR data and Sentinel-1:
110110
| :-----------:|:---------- | :------------- | :--------------- |
111111
| Randolph Glacier Inventory | [RGI Consortium](https://www.glims.org/RGI/) | Shapefile | NSIDC |
112112

113-
The Randolph Glacier Inventory (RGI) is a community-driven public dataset that provides outlines and auxiliary information such as area, length and asepct of glaciers across the world. RGI is a subset of the Global Land Ice Measurements from Space ([GLIMS](https://www.glims.org/)) initiative and RGI data is hosted by the National Snow and Ice Data Center ([NSDIC](https://nsidc.org/data/nsidc-0770/versions/7)). Read more about the RGI project [here](http://www.glims.org/rgi_user_guide/01_introduction.html).
113+
The Randolph Glacier Inventory (RGI) is a community-driven public dataset that provides outlines and auxiliary information such as area, length and asepct of glaciers across the world {cite}`RGI_Consortium_2023`. RGI is a subset of the Global Land Ice Measurements from Space ([GLIMS](https://www.glims.org/)) initiative and RGI data is hosted by the National Snow and Ice Data Center ([NSDIC](https://nsidc.org/data/nsidc-0770/versions/7)). Read more about the RGI project [here](http://www.glims.org/rgi_user_guide/01_introduction.html).
114114

115115

116116
:::{admonition} RGI data used in this tutorial

book/endmatter/about_this_book.md

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,19 @@
11
# About this book
22

3-
These tutorials were initially developed while Emma Marshall interned with the Summer Internships in Parallel Computational Sciences ([SIParCS](https://www.cisl.ucar.edu/outreach/internships)) program at the National Center for Atmospheric Research ([NCAR](https://ncar.ucar.edu/)). Jessica Scheick, Scott Henderson, and Deepak Cherian were internship supervisors for this project. The internship was also supported by a NASA Open Source Tools, Frameworks, and Libraries program (Award 80NSSC22K0345), with a specific focus on developing educational resources for working with cloud-hosted data using Xarray. Tutorial development continued after the conclusion of the SIParCS internship when Emma Marshall returned to the University of Utah as a Ph.D. student, where she was supported by a (insert finesst grant no ? ).
4-
3+
These tutorials were initially developed while Emma Marshall interned with the Summer Internships in Parallel Computational Sciences ([SIParCS](https://www.cisl.ucar.edu/outreach/internships)) program at the National Center for Atmospheric Research ([NCAR](https://ncar.ucar.edu/)). Jessica Scheick, Scott Henderson, and Deepak Cherian were internship supervisors for this project. The internship was also supported by a NASA Open Source Tools, Frameworks, and Libraries program (Award 80NSSC22K0345), with a specific focus on developing educational resources for working with cloud-hosted data using Xarray. Tutorial development continued after the conclusion of the SIParCS internship when Emma Marshall returned to the University of Utah as a Ph.D. student, where she was supported by a FINESST Fellowship Grant (80NSSC22K1536).
54
## Contributing
65

76
If you'd like to contribute to this book, please start a discussion or raise an issue in the GitHub [repository](https://github.com/e-marshall/cloud-open-source-geospatial-datacube-workflows).
87

98
## Citation
109

11-
If you use this material, please include the following citation:
10+
If you use this material, please include the following citation:
11+
This book is currently under review in the [Journal of Open Source Education](https://jose.theoj.org/), please check back later for a citation.
12+
13+
## Acknowledgements
14+
15+
This book is the product of many discussions and developments within the open-source community. All of the workflows demonstrates throughout these tutorials are made possible by open-source developers and maintainers. Below is a full list of all open-source libraries used in this book:
16+
17+
cf_xarray {cite}`cherian_2025_cfxarray`, Dask {cite}`DaskLibrary`, Folium {cite}`filipe_2025_folium`, GeoPandas {cite}`Jordahl_Bossche_Fleischmann_McBride_Wasserman_Badaracco_Gerard_Snow_Tratner_Perry_etal_2021`, Holoviz {cite}`philipp_rudiger_2020_holoviz`, Jupyter Notebook {cite}`Kluyver2016jupyter`, Matplotlib {cite}`Caswell_2021_matplotlib`, NumPy {cite}`harris2020array`, Pandas {cite}`Pandasteam_2024`, Planetary_Computer {cite}`Source_McFarland_Emanuele_Morris_Augspurger_2022`, PyPROJ {cite}`snow_2025_pyproj`, SciPy {cite}`ralf_gommers_2024_scipy`, Shapely {cite}`Gillies_van_der_Wel_Van_den_Bossche_Taves_Arnott_Ward_others_2022`, RioXarray {cite}`rioxarray_2022_dev`, Xarray {cite}`Hoyer_Hamman_2017`, Zarr {cite}`Zarr_2020_dev`, [Xvec](https://xvec.readthedocs.io/en/stable/), [contextily](https://contextily.readthedocs.io/en/latest/), [PySTAC](https://pystac.readthedocs.io/en/stable/), [PySTAC Client](https://pystac-client.readthedocs.io/en/stable/), [Rich](https://rich.readthedocs.io/en/stable/introduction.html), [stackstac](https://stackstac.readthedocs.io/en/latest/).
1218

19+
This book was made using Jupyter Book {cite}`executable_books_community_2020_4539666` which uses a number of tools developed by the [Executable Books](https://executablebooks.org/) project.

book/endmatter/appendix.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
11
# Appendix
22

3-
While developing this book, we encountered different examples that didn't always fit into the overall scope of the tutorials, but still may be useful to others.
3+
While developing this book, we encountered different examples that didn't always fit into the overall scope of the tutorials, but still may be useful to others, which we've included here.
44

5-
## [1. Troubleshooting geometry types (ITS_LIVE tutorial)](nbs/1_handle_mult_geom_types.ipynb)
5+
## 1. Troubleshooting visualizations with different geometry types[$\tiny \nearrow$](nbs/1_handle_mult_geom_types.ipynb)
6+
*From the ITS_LIVE [tutorial](../itslive/itslive_intro.md).*
67

78
In the first tutorial, while making an [interactive visualization of vector dataframes](../itslive_nbs/3_combining_raster_vector_data.ipynb), we encountered a warning. This notebook includes a step-by-step demonstration of troubleshooting this warning, identifying its source and resolving it.
89

9-
## [2. Reading a stack of files with `xr.open_mfdataset()` (Sentinel-1 tutorial)](nbs/2_read_w_xropen_mfdataset.ipynb)
10+
## 2. Reading multiple files with `xr.open_mfdataset()`[$\tiny \nearrow$](nbs/2_read_w_xropen_mfdataset.ipynb)
11+
*From the Sentinel-1 RTC [tutorial](../sentinel1/s1_intro.md).*
1012

1113
Xarray's `xr.open_mfdataset()` [function](https://docs.xarray.dev/en/stable/generated/xarray.open_mfdataset.html) allows the user to read in and combine multiple files at once to produce a single `xr.DataArray` object. This approach was explore when developing the [Read ASF-processed Sentinel-1 RTC data notebook](../sentinel1/nbs/1_read_asf_data.ipynb). However, `xr.open_mfdataset() didn't work well for this purpose because, while the stack of raster files used in this example covers a common area of interest, it includes several different spatial footprints. This creates problems when specifying a chunking strategy.
1214

@@ -25,7 +27,4 @@ In addition to the documentation linked above, some other useful resources for `
2527

2628
```{note}
2729
If you wanted to select scenes from a single viewing geometry at the expense of a denser time series, `xr.open_mfdataset()` might work a bit better (this approach has not been tested).
28-
```
29-
30-
## [3. Another regridding approach using `xESMF` (Sentinel-1 tutorial)](nbs/3_regridding_w_xesmf.ipynb)
31-
This notebook demonstrates an alternative approach to the regridding shown in [noteboook 5](../sentinel1/nbs/5_comparing_s1_rtc_datasets.ipynb) of Tutorial 2, but this time using a different regridding package.
30+
```

book/intro/software.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,4 +32,3 @@ There are two options for creating a software environment: [pixi](https://pixi.s
3232
```jupyterlab```
3333

3434
Both tutorials use functions that are stored in scripts associated with each dataset. You can find these scripts here: [`itslive_tools.py`](../itslive/nbs/itslive_tools.py) and [`s1_tools.py`](../s1/nbs/s1_tools.py).
35-

book/introduction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Cloud-native geospatial datacube workflows with open-source tools
22

33

4-
Welcome to `Cloud-native geospatial datacube workflows with open-source tools`! This tutorial demonstrates steps of typical scientific workflows involving earth observation data with a focus on cloud-optimized data formats, larger-than memory data, and manipulating multi-dimensional datasets.
4+
Welcome to `Cloud-native geospatial datacube workflows with open-source tools`! This tutorial demonstrates steps of typical scientific workflows involving earth observation data with a focus on cloud-optimized data formats, larger-than memory data, and manipulating multi-dimensional datasets to prepare them for analysis.
55

66
We focus on data derived from different types of satellite imagery that are publicly available in cloud-hosted repositories such as [AWS S3](https://aws.amazon.com/s3/). These tutorials demonstrate how to work with this data in Python, using software packages from the popular [Pangeo](https://www.pangeo.io/) ecosystem that are built on and around the [Xarray](https://xarray.dev/) data model.
77

book/itslive/itslive_intro.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,11 +35,11 @@ This tutorial contains jupyter notebooks demonstrating various steps of a typica
3535

3636
This tutorial will spend a lot of time discussing the following concepts, if they're unfamiliar to you, we recommend first heading to [Relevant Concepts](../background/relevant_concepts.md).
3737

38-
## 1. {term}`Larger than memory data`
38+
1. {term}`Larger than memory data`
3939

40-
## 2. {term}`Dask`
40+
2. {term}`Dask`
4141

42-
## 3. {term}`Chunking`
42+
3. {term}`Chunking`
4343

4444
:::
4545
:::{tab-item} Learning goals
@@ -62,4 +62,6 @@ For instructions on setting up a computing environment needed for this tutorial,
6262

6363
For more background on the data used in this tutorial, head to [Tutorial Data](../background/tutorial_data.md).
6464

65-
::::
65+
::::
66+
67+
To get started with this tutorial, make sure you've followed the instructions on the [Software](../intro/software.md) page for downloading the necessary material and setting up a virtual environment, then head to the first notebook.

book/itslive/nbs/1_accessing_itslive_s3_data.ipynb

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -306,18 +306,6 @@
306306
"In addition to passing `url` to `xr.open_dataset()`, we include `chunks='auto'`. This introduces [dask](https://www.dask.org/) into our workflow; `chunks='auto'` will choose chunk sizes that match the underlying data structure; this is often ideal, but sometimes you may need to specify different chunking schemes. You can read more about choosing good chunk sizes [here](https://blog.dask.org/2021/11/02/choosing-dask-chunk-sizes); subsequent notebooks in this tutorial will explore different approaches to dask chunking. "
307307
]
308308
},
309-
{
310-
"cell_type": "markdown",
311-
"id": "a844766e",
312-
"metadata": {
313-
"tags": [
314-
"papermill-error-cell-tag"
315-
]
316-
},
317-
"source": [
318-
"<span id=\"papermill-error-cell\" style=\"color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;\">Execution using papermill encountered an exception here and stopped:</span>"
319-
]
320-
},
321309
{
322310
"cell_type": "code",
323311
"execution_count": null,

0 commit comments

Comments
 (0)