Skip to content

fix, carry per-file worker pool in a fileno-keyed singleton registry (H5Fget_access_plist strips FAPL props) #286

@steven-varga

Description

@steven-varga

Summary

h5::threads{N} installs the worker pool via H5Pinsert2 on the user FAPL, but every read/write dispatch re-derives the FAPL with H5Fget_access_plist, which reconstructs a FAPL from the file's library-known access properties and drops user-inserted custom properties. resolve_worker_pool returns null and pool_pipeline_t never engages — high_throughput + threads{N} silently falls back to synchronous basic_pipeline_t.

Confirmed behavior (probes, HDF5 1.12.3 AND 1.14.6 — version-independent)

  • ✘ FAPL custom property: PRESENT on the original FAPL, ABSENT after H5Fget_access_plist (H5Pequal=0). By design per the HDF5 VFL docs — the driver fapl_get callback reconstructs only driver-specific properties, and "user-defined properties cannot currently be encoded."
  • ✔ DAPL/DCPL custom property SURVIVES H5Dget_access_plist on the same open handle (this is why high_throughput works today) — but NOT across close/reopen.
  • H5Fget_fileno is stable per physical file and shared across multiple opens (two opens → one fileno) → correct per-file key.

Chosen fix — fileno-keyed singleton registry (candidate #2 from the original analysis)

Carry the per-file worker pool in a process-global singleton keyed by H5Fget_fileno, not in the FAPL:

  • io_registry_t : h5::impl::singleton_t<io_registry_t> (MIT, vendored), lazy init via std::call_once.
  • attach at H5Fcreate/H5Fopen while the original FAPL is live (reads threads{N} before any stripping); refcount by open.
  • resolve at dispatch via H5Fget_fileno(H5Iget_file_id(ds)); optional DAPL pointer-cache for the hot path.
  • detach + drain at ~fd_t (before H5Fclose); refs→0 tears down.
  • Per-process scope (correct: HDF5 thread-safety + SWMR/MPI are per-process); one pool per physical file (multi-open shares).

Rejected alternatives: DAPL stash (per-dataset, cannot be the per-file singleton); re-architect via #266 (couples to the larger async rework).

Scope (this issue)

  • registry mechanism + singleton_t + call_once + ~fd_t drain
  • repoint pool resolution FAPL→registry; behavior-preserving otherwise
  • pool_pipeline_t becomes reachable through the public API
  • tests: call_once first-open race, multi-open→one pool, leaked-fd_t teardown ordering
  • remove the test, raise library line coverage above 95% #285 coverage exclusion for H5Zpipeline_pool.hpp

Out of scope (follow-on issue)

Single per-file IO collector model, descriptor thinning (async exec, pt_t), and the h5::sync + h5::async + threaded filter chain build on top of this mechanism.

Acceptance

  • pool_pipeline_t executes with threads{N} + high_throughput (non-zero H5Zpipeline_pool.hpp coverage; probe asserts resolve_* returns non-null at the dispatch site)
  • ✔ a regression test asserts the pool engages (not just that data round-trips)
  • ✔ the test, raise library line coverage above 95% #285 coverage exclusion for H5Zpipeline_pool.hpp is removed
  • ✔ FAPL/DAPL/fileno behavior probes committed under test/

Supersedes the original report; the FAPL-stripping root cause and the fileno→pool registry candidate are preserved above. Behavior confirmed on HDF5 1.12.3 and 1.14.6.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions