Fix bug in cfdm.read that prevented some OPeNDAP URLS being read#407
Conversation
sadielbartholomew
left a comment
There was a problem hiding this comment.
This all seems sensible and makes the fix so that the URL can be read in, but when I inspect one (any, based on trying several) of the fields for the example URL case, the data is not found - is it expected that the data be accessible, or just the metadata? (Illustrative example below.)
Might be fine (not sure exactly what 'support' looks like for OPeNDAP reading, and the docs notably the cfdm.read API reference doesn't say much), but want to check. And it might be a separate issue to the immediate one raised!
It seems from the traceback that the issue is that the client doesn't accept the request for the array data - maybe the API has different endpoints or requirements for (larger) arrays than for metadata? Just a quick guess - would be good to get some context before I spend any time investigating.
Example
# Setup from PR example and URL
import cfdm
cfdm.environment()
f = cfdm.read('https://data.pmel.noaa.gov/aclim/thredds/dodsC/B10K-K20_Level2_CORECFS_integrated_col\
lection.nc')
# Showing data issue: see '??'
>>> f[13].data
<Data(2837, 258, 182): [[[??, ..., ??]]] mg C m^-2 d^-1>
>>> f[0].data
<Data(2837, 258, 182): [[[??, ..., ??]]] (mg C m^-3)*m>
>>> f[21].data
<Data(2837, 258, 182): [[[??, ..., ??]]] mg C m^-2 d^-1>
# Traceback
>>> f[13].data.array
Traceback (most recent call last):
File "/home/slb93/miniconda3/envs/cf-env-314/lib/python3.14/site-packages/fsspec/implementations/http.py", line 444, in _info
await _file_info(
...<5 lines>...
)
File "/home/slb93/miniconda3/envs/cf-env-314/lib/python3.14/site-packages/fsspec/implementations/http.py", line 860, in _file_info
r.raise_for_status()
~~~~~~~~~~~~~~~~~~^^
File "/home/slb93/miniconda3/envs/cf-env-314/lib/python3.14/site-packages/aiohttp/client_reqrep.py", line 636, in raise_for_status
raise ClientResponseError(
...<5 lines>...
)
aiohttp.client_exceptions.ClientResponseError: 400, message='', url='https://data.pmel.noaa.gov/aclim/thredds/dodsC/B10K-K20_Level2_CORECFS_integrated_collection.nc'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<python-input-22>", line 1, in <module>
f[13].data.array
File "/home/slb93/git-repos/cfdm/cfdm/data/data.py", line 2768, in array
a = self.compute(_cache_elements=False).copy()
~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
File "/home/slb93/git-repos/cfdm/cfdm/data/data.py", line 4223, in compute
a = dx.compute()
File "/home/slb93/miniconda3/envs/cf-env-314/lib/python3.14/site-packages/dask/base.py", line 377, in compute
(result,) = compute(self, traverse=False, **kwargs)
~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/slb93/miniconda3/envs/cf-env-314/lib/python3.14/site-packages/dask/base.py", line 685, in compute
results = schedule(expr, keys, **kwargs)
File "/home/slb93/git-repos/cfdm/cfdm/data/dask_utils.py", line 29, in cfdm_to_memory
return np.asanyarray(a)
~~~~~~~~~~~~~^^^
File "/home/slb93/git-repos/cfdm/cfdm/data/mixin/indexmixin.py", line 60, in __array__
array = self._get_array()
File "/home/slb93/git-repos/cfdm/cfdm/data/netcdf4array.py", line 83, in _get_array
netcdf, address = self.open()
~~~~~~~~~^^
File "/home/slb93/git-repos/cfdm/cfdm/data/netcdf4array.py", line 227, in open
return super().open(netCDF4.Dataset, mode="r", **kwargs)
~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/slb93/git-repos/cfdm/cfdm/data/abstract/filearray.py", line 507, in open
filename = fs.open(filename, "rb")
File "/home/slb93/miniconda3/envs/cf-env-314/lib/python3.14/site-packages/fsspec/spec.py", line 1349, in open
f = self._open(
path,
...<4 lines>...
**kwargs,
)
File "/home/slb93/miniconda3/envs/cf-env-314/lib/python3.14/site-packages/fsspec/implementations/http.py", line 383, in _open
size = size or info.update(self.info(path, **kwargs)) or info["size"]
~~~~~~~~~^^^^^^^^^^^^^^^^
File "/home/slb93/miniconda3/envs/cf-env-314/lib/python3.14/site-packages/fsspec/asyn.py", line 118, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/home/slb93/miniconda3/envs/cf-env-314/lib/python3.14/site-packages/fsspec/asyn.py", line 103, in sync
raise return_result
File "/home/slb93/miniconda3/envs/cf-env-314/lib/python3.14/site-packages/fsspec/asyn.py", line 56, in _runner
result[0] = await coro
^^^^^^^^^^
File "/home/slb93/miniconda3/envs/cf-env-314/lib/python3.14/site-packages/fsspec/implementations/http.py", line 457, in _info
raise FileNotFoundError(url) from exc
FileNotFoundError: https://data.pmel.noaa.gov/aclim/thredds/dodsC/B10K-K20_Level2_CORECFS_integrated_collection.nc
...|
Great spot, Sadie. Fixed so the data read after the initial file scan works: 738ae6a |
sadielbartholomew
left a comment
There was a problem hiding this comment.
Thanks for the quick feedback response. I can confirm that the new commit resolves the data reading issue and everything then works. So, besides one typo, is good to merge so please go ahead!
Just to note, the data array reading was really very slow on my laptop, but that was fair enough given it was on a frustratingly slow eduroam connection and that this is remote access of a fairly large array (f[10].size being 133214172) anyway. But perhaps just to track how bad it was (probably just as an advert to use the wired connection and not eduroam wifi!):
$ time python -c "import cfdm; f = cfdm.read('https://data.pmel.noaa.gov/aclim/thredds/dodsC/B10K-K20_Level2_CORECFS_in\
tegrated_collection.nc'); print(f[10])"
Field: long_name=time-averaged Large phytoplankton concentration, integrated over depth (ncvar%PhL)
...
python -c 2.17s user 0.10s system 15% cpu 14.578 total
$ time python -c "import cfdm; f = cfdm.read('https://data.pmel.noaa.gov/aclim/thredds/dodsC/B10K-K20_Level2_CORECFS_in\
tegrated_collection.nc'); print(f[10].data)"
[[[476.34832763671875, ..., --]]] (mg C m^-3)*m
python -c 2.28s user 0.10s system 8% cpu 26.585 total
$ time python -c "import cfdm; f = cfdm.read('https://data.pmel.noaa.gov/aclim/thredds/dodsC/B10K-K20_Level2_CORECFS_in\
tegrated_collection.nc'); print(f[10].data.array)"
[[[476.34832763671875 475.08892822265625 480.3956298828125 ...
...
python -c 5.26s user 2.10s system 1% cpu 8:08.37 total
Fixes #406