Skip to content

Chunk index logged as rebuilt on every var access for the same variable #206

@bnlawrence

Description

@bnlawrence

When the same variable is accessed more than once via f[var] on an already-open f, the log reports "Building chunk index" / "Chunk index built" on every access.

Environment: pyfive 1.1.1, Python 3.12

Minimal reproducer (local file):

import logging, pyfive
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(name)s: %(message)s")

with pyfive.File("da193a_25_3hr__198807-198807.nc", "r") as handle:
    for i in range(4):
        ds = handle["/m01s03i245"]
        print(f"access {i+1}: shape={ds.shape}")

Output:

access 1: shape=(240, 324, 432)
...
access 4: shape=(240, 324, 432)
INFO pyfive.high_level: [pyfive] Accessing object '/m01s03i245' with link target 38262 (lazy access: False)
INFO pyfive.h5d: [pyfive] Building chunk index (pyfive version=1.1.1)
INFO pyfive.h5d: [pyfive] Chunk index built: btree range=(64458, 505785541); elapsed=1ms
INFO pyfive.high_level: [pyfive] Accessing object '/m01s03i245' with link target 38262 (lazy access: False)
INFO pyfive.h5d: [pyfive] Building chunk index (pyfive version=1.1.1)
INFO pyfive.h5d: [pyfive] Chunk index built: btree range=(64458, 505785541); elapsed=0ms
... [×2 more]

Question: Is the chunk index actually being rebuilt each time, or is the log line unconditional and the result is served from a cache? The 0ms elapsed on accesses 2–4 suggests the latter, but if the B-tree is being re-parsed from the file on every subscript access that would be expensive over a remote (S3/HTTPS) fsspec filesystem where each seek/read has latency.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions