Update plot_events script#1274
Conversation
AnnaKwa
commented
Jun 12, 2026
- Add option to cache the downloaded dataset instead of saving to temp directory. Currently it defaults to caching so that it's less tedious to call the script, but if desired I can change the default to not cache.
- Fix bug where the coarse data would select the 00 hour timestep if the event filename only had YYYYMMDD in the name. Now the script checks the config in the beaker dataset to get the exact event timestamp.
- Add TMP2m to the coarse variables to read
- Remove coarse PRESsfc relabeling to PRMSL since we now use PRMSL for both inputs and outputs.
The coarse panel was selected using a timestamp parsed from the event filename, defaulting to 12Z when the filename had no hour suffix. Most events are not at 12Z (e.g. the heat wave events are at 00Z), so the coarse field showed the wrong time of day. Read each event's date from the config.yaml saved with the beaker dataset instead, falling back to filename parsing (now defaulting to 00Z, the evaluator convention) only when the event is missing from the config. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
find_event_files keyed files by the event name with the date stripped, so when a dataset contained several dates for the same event (e.g. three Phl_tc_landfall files) only the last one alphabetically was processed. Key by the full filename stem instead; this also makes the keys match the event names in the evaluator config.yaml. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
| """Find netCDF files matching the event naming pattern, keyed by event name.""" | ||
| """Find netCDF files matching the event naming pattern, keyed by filename | ||
| stem (event name including date, so multiple dates of the same event are | ||
| kept).""" |
There was a problem hiding this comment.
Before, if there were files with the same name and different dates, only one was plotted.
There was a problem hiding this comment.
I think there's one leftover error from the adjusted PRMSL conditional, and I think I would prefer an opt in for the caching instead of defaulting to storage (I don't feel strongly). My reasoning being that the datasets are usually small and quick to download. At the very least if not making opt-in, I would suggest the default cache be tied to something that gets wiped with the machine reset like /tmp/beaker, but would still be persistent across script usage.
| ds_["PRMSL_coarse"].values[:] = np.nan | ||
| # For colorbar range, use only target and predicted (coarse is hidden) | ||
| arr = ds_[["PRMSL_target", "PRMSL_predicted"]].to_array() | ||
| else: |
There was a problem hiding this comment.
Does this else block need to be adjusted? It looks like it pairs with the if len(samples)... from above
| ds_["PRMSL_coarse"].values[:] = np.nan | ||
| # For colorbar range, use only target and predicted (coarse is hidden) | ||
| arr = ds_[["PRMSL_target", "PRMSL_predicted"]].to_array() | ||
| else: |
There was a problem hiding this comment.
Does this else block need to be adjusted? It looks like it pairs with the if len(samples)... from above
| def fetch_beaker_dataset( | ||
| dataset_id: str, | ||
| target_dir: str, | ||
| prefix: str | None = None, |
There was a problem hiding this comment.
Prefix seems like a straightforward addition, but I didn't see it actually used anywhere.
| dataset_id: str, | ||
| target_dir: str, | ||
| prefix: str | None = None, | ||
| cache_dir: str | None = "~/Downloads/beaker_cache", |
There was a problem hiding this comment.
I think persistent cache would make more sense as opt-in not opt-out.
| """ | ||
| if cache_dir is not None: | ||
| cached = Path(cache_dir).expanduser() / dataset_id | ||
| if cached.is_dir() and any(cached.iterdir()): |
There was a problem hiding this comment.
One thing for persistent caches flagged by Claude: a partial beaker fetch that failed would still pass this check. You could add a sentinel file after the subprocess completes successfully.