Skip to content

Fix bug in cfdm.read that prevented some OPeNDAP URLS being read#407

Merged
davidhassell merged 4 commits into
NCAS-CMS:mainfrom
davidhassell:opendap
May 30, 2026
Merged

Fix bug in cfdm.read that prevented some OPeNDAP URLS being read#407
davidhassell merged 4 commits into
NCAS-CMS:mainfrom
davidhassell:opendap

Conversation

@davidhassell

Copy link
Copy Markdown
Contributor

Fixes #406

@davidhassell davidhassell added this to the NEXTVERSION milestone May 27, 2026
@davidhassell davidhassell added bug Something isn't working dataset read Relating to reading datasets labels May 27, 2026

@sadielbartholomew sadielbartholomew left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all seems sensible and makes the fix so that the URL can be read in, but when I inspect one (any, based on trying several) of the fields for the example URL case, the data is not found - is it expected that the data be accessible, or just the metadata? (Illustrative example below.)

Might be fine (not sure exactly what 'support' looks like for OPeNDAP reading, and the docs notably the cfdm.read API reference doesn't say much), but want to check. And it might be a separate issue to the immediate one raised!

It seems from the traceback that the issue is that the client doesn't accept the request for the array data - maybe the API has different endpoints or requirements for (larger) arrays than for metadata? Just a quick guess - would be good to get some context before I spend any time investigating.

Example

# Setup from PR example and URL
import cfdm
cfdm.environment()
f = cfdm.read('https://data.pmel.noaa.gov/aclim/thredds/dodsC/B10K-K20_Level2_CORECFS_integrated_col\
lection.nc')

# Showing data issue: see '??'
>>> f[13].data
<Data(2837, 258, 182): [[[??, ..., ??]]] mg C m^-2 d^-1>
>>> f[0].data
<Data(2837, 258, 182): [[[??, ..., ??]]] (mg C m^-3)*m>
>>> f[21].data
<Data(2837, 258, 182): [[[??, ..., ??]]] mg C m^-2 d^-1>

# Traceback
>>> f[13].data.array
Traceback (most recent call last):
  File "/home/slb93/miniconda3/envs/cf-env-314/lib/python3.14/site-packages/fsspec/implementations/http.py", line 444, in _info
    await _file_info(
    ...<5 lines>...
    )
  File "/home/slb93/miniconda3/envs/cf-env-314/lib/python3.14/site-packages/fsspec/implementations/http.py", line 860, in _file_info
    r.raise_for_status()
    ~~~~~~~~~~~~~~~~~~^^
  File "/home/slb93/miniconda3/envs/cf-env-314/lib/python3.14/site-packages/aiohttp/client_reqrep.py", line 636, in raise_for_status
    raise ClientResponseError(
    ...<5 lines>...
    )
aiohttp.client_exceptions.ClientResponseError: 400, message='', url='https://data.pmel.noaa.gov/aclim/thredds/dodsC/B10K-K20_Level2_CORECFS_integrated_collection.nc'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<python-input-22>", line 1, in <module>
    f[13].data.array
  File "/home/slb93/git-repos/cfdm/cfdm/data/data.py", line 2768, in array
    a = self.compute(_cache_elements=False).copy()
        ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/slb93/git-repos/cfdm/cfdm/data/data.py", line 4223, in compute
    a = dx.compute()
  File "/home/slb93/miniconda3/envs/cf-env-314/lib/python3.14/site-packages/dask/base.py", line 377, in compute
    (result,) = compute(self, traverse=False, **kwargs)
                ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/slb93/miniconda3/envs/cf-env-314/lib/python3.14/site-packages/dask/base.py", line 685, in compute
    results = schedule(expr, keys, **kwargs)
  File "/home/slb93/git-repos/cfdm/cfdm/data/dask_utils.py", line 29, in cfdm_to_memory
    return np.asanyarray(a)
           ~~~~~~~~~~~~~^^^
  File "/home/slb93/git-repos/cfdm/cfdm/data/mixin/indexmixin.py", line 60, in __array__
    array = self._get_array()
  File "/home/slb93/git-repos/cfdm/cfdm/data/netcdf4array.py", line 83, in _get_array
    netcdf, address = self.open()
                      ~~~~~~~~~^^
  File "/home/slb93/git-repos/cfdm/cfdm/data/netcdf4array.py", line 227, in open
    return super().open(netCDF4.Dataset, mode="r", **kwargs)
           ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/slb93/git-repos/cfdm/cfdm/data/abstract/filearray.py", line 507, in open
    filename = fs.open(filename, "rb")
  File "/home/slb93/miniconda3/envs/cf-env-314/lib/python3.14/site-packages/fsspec/spec.py", line 1349, in open
    f = self._open(
        path,
    ...<4 lines>...
        **kwargs,
    )
  File "/home/slb93/miniconda3/envs/cf-env-314/lib/python3.14/site-packages/fsspec/implementations/http.py", line 383, in _open
    size = size or info.update(self.info(path, **kwargs)) or info["size"]
                               ~~~~~~~~~^^^^^^^^^^^^^^^^
  File "/home/slb93/miniconda3/envs/cf-env-314/lib/python3.14/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/home/slb93/miniconda3/envs/cf-env-314/lib/python3.14/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/home/slb93/miniconda3/envs/cf-env-314/lib/python3.14/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "/home/slb93/miniconda3/envs/cf-env-314/lib/python3.14/site-packages/fsspec/implementations/http.py", line 457, in _info
    raise FileNotFoundError(url) from exc
FileNotFoundError: https://data.pmel.noaa.gov/aclim/thredds/dodsC/B10K-K20_Level2_CORECFS_integrated_collection.nc
...

@davidhassell

Copy link
Copy Markdown
Contributor Author

Great spot, Sadie. Fixed so the data read after the initial file scan works: 738ae6a

@sadielbartholomew sadielbartholomew left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick feedback response. I can confirm that the new commit resolves the data reading issue and everything then works. So, besides one typo, is good to merge so please go ahead!

Just to note, the data array reading was really very slow on my laptop, but that was fair enough given it was on a frustratingly slow eduroam connection and that this is remote access of a fairly large array (f[10].size being 133214172) anyway. But perhaps just to track how bad it was (probably just as an advert to use the wired connection and not eduroam wifi!):

$ time python -c "import cfdm; f = cfdm.read('https://data.pmel.noaa.gov/aclim/thredds/dodsC/B10K-K20_Level2_CORECFS_in\
tegrated_collection.nc'); print(f[10])"

Field: long_name=time-averaged Large phytoplankton concentration, integrated over depth (ncvar%PhL)
...
python -c   2.17s user 0.10s system 15% cpu 14.578 total

$ time python -c "import cfdm; f = cfdm.read('https://data.pmel.noaa.gov/aclim/thredds/dodsC/B10K-K20_Level2_CORECFS_in\
tegrated_collection.nc'); print(f[10].data)"

[[[476.34832763671875, ..., --]]] (mg C m^-3)*m
python -c   2.28s user 0.10s system 8% cpu 26.585 total

$ time python -c "import cfdm; f = cfdm.read('https://data.pmel.noaa.gov/aclim/thredds/dodsC/B10K-K20_Level2_CORECFS_in\
tegrated_collection.nc'); print(f[10].data.array)"

[[[476.34832763671875 475.08892822265625 480.3956298828125 ...
...
python -c   5.26s user 2.10s system 1% cpu 8:08.37 total

Comment thread cfdm/read_write/netcdf/netcdfread.py Outdated
Co-authored-by: Sadie L. Bartholomew <sadie.bartholomew@ncas.ac.uk>
@davidhassell davidhassell merged commit e923efe into NCAS-CMS:main May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working dataset read Relating to reading datasets

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cfdm.read(opendapUrl) doesn't work at v1.13.1.0

2 participants