diff --git a/intermediate/cataloging.ipynb b/intermediate/cataloging.ipynb new file mode 100644 index 00000000..89225128 --- /dev/null +++ b/intermediate/cataloging.ipynb @@ -0,0 +1,61 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "0", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "source": [ + "# Data cataloguing for Xarray\n", + "\n", + ":::{admonition} Under construction\n", + "This notebook is very much still under construction\n", + ":::\n", + "\n", + "**Goals:** At the end of this tutorial, you'll have an overview about what data cataloging, why it is done, what tools are available. TODO: Refine goal\n", + "\n", + "## What is cataloging? Why is it useful?\n", + "\n", + "- Many different ways to open Xarray datasets\n", + " - From file\n", + " - Netcdf\n", + " - Zarr\n", + " - From Icechunk store\n", + " - From remote URLs\n", + " - From custom engines (see tutorial x)\n", + "- Copying all of this from script to script is TIRING\n", + "- Its a data engineering problem (and one that's on the org level)\n", + "- What if we could map out the datasets that we use on an org level, and expose that as a collection of datasets? A ✨catalogue✨ if you will 🧐\n", + "\n", + "\n", + "## \n", + "\n", + "\n", + "## Packages\n", + "\n", + "- Odc-stac\n", + "- Stackstac\n", + "- Xpystac\n", + "- lazycogs(?)\n", + "- intake v2\n", + "\n", + "\n", + "## More resources\n", + "\n", + "https://guide.cloudnativegeo.org/cookbooks/zarr-stac-report/data-consumers/ \n", + "\n", + "\n" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}