Skip to content
Noah Prime edited this page Nov 28, 2023 · 2 revisions

BASD with CMIP6

One of the archives you may use often with basd are the CMIP6 datasets, which you can query and download using their browser tool here.

This is an extremely large archive however, so here we'll link to some resources that may help ease the burden on your system. It is possible to download datasets, use basd, and clear the original data from your disk, all within Python. Below are two such examples of how you may achieve this.

  1. Using Pangeo

    • Pangeo is a package and community of people that provide computational tools to do scalabale analysis of large geoscience datasets. One tool allows us to query and download CMIP6 datasets as xarray objects, which is what's used for basd. See links here and here to learn more about Pangeo and Pangeo with CMIP6.
    • Some functions that may be useful are:
    def fetch_pangeo_table():
        # The url path that contains to the pangeo archive table of contents.
        url = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
        dat = intake.open_esm_datastore(url)
        dat = dat.df
        out = (dat.loc[dat['grid_label'] == "gn"][["source_id", 
                                                   "experiment_id", 
                                                   "member_id", 
                                                   "variable_id",
                                                   "zstore", 
                                                   "table_id"]].copy())
        out = out.rename(columns={"source_id": "model", 
                                  "experiment_id": "experiment",
                                  "member_id": "ensemble", 
                                  "variable_id": "variable",
                                  "zstore": "zstore", 
                                  "table_id": "domain"}).copy()
        out = (out.loc[out['experiment'].isin(exps)]).drop_duplicates().reset_index(drop=True).copy()
    
        return out

    to get a table of available datasets and their URLS, and

    def fetch_nc(zstore, **kwargs):
        """Extract data for a single file.
        :param zstore:                str of the location of the cmip6 data file on pangeo.
        :return:                      an xarray containing cmip6 data downloaded from the pangeo.
        """
        ds = xr.open_zarr(fsspec.get_mapper(zstore), **kwargs)
        return ds

    to access a NetCDF file from the supplied URL.

  2. Using the ESGF API

Clone this wiki locally