Capability registry — shared name→object vocabulary (#64) by espg · Pull Request #73 · englacial/zagg

espg · 2026-06-22T19:08:05Z

Closes #64.

Implements the plugin/capability registry as the shared name→object vocabulary for configs, Lambda payloads, and the MCP describe_products tool (#59), per the design locked with @espg on the issue thread (#64 (comment)).

What / approach

New src/zagg/registry.py — a Registry[T] class core with thin per-kind functional wrappers:

Registry[T] with register / get / list / describe / describe_all over the eight capability kinds from the June plan (spatial_func, reducer, mask_provider, field_transform, event_trigger, reader, catalog_source, credential_provider). register doubles as a decorator (@SPATIAL_FUNCS.register("max")) or a direct call.
Strings, never callables — the registry maps a public name to a private object; configs/payloads carry the name and re-resolve at the execution site.
Optional per-entry description + schema — the locked "(B) now, (C) where it earns its keep" path: entries without a schema pay nothing; describe surfaces a pydantic/JSON arg schema when a registrant supplies one. This is the additive route to the full MCP parameter surface without forcing every capability author to write a schema up front.
UnknownCapability(name, kind, available=[...]) — subclasses KeyError (so existing except KeyError paths still catch it) while carrying the kind + sorted available names for a clean "did you mean…" / MCP-relayable error.
Lazy zagg.plugins entry-point discovery — first get/list triggers a one-time sweep; a broken plugin's load()/register() is logged-and-skipped, and a failure in entry_points() itself leaves the seam retryable. Includes the re-entrancy guard for a plugin whose register() calls back through get.
describe_all() returns the structured {kind: [{name, kind, description, schema?}, …]} shape MCP Server #59 consumes; registry_snapshot() keeps the name-only view for diagnostics.
Per-kind functional wrappers (register_spatial_func / get_spatial_func / list_spatial_funcs, ×8) so call sites read clearly and miss-messages name the kind — names match Temporal aggregation infrastructure (Refs #12) #70's existing surface so its rebase (below) is minimal.

Built-ins register nothing yet — this ships the seam alone; the temporal engine's mask providers / field transforms / etc. register into it as their owning modules land.

Relationship to #70

#70's Phase 2 shipped a functional registry; this PR lands the canonical Registry[T] class core per #64's locked design. As agreed (rebase, not restart — #63 (comment)), #70 rebases its registry onto this one and drops its copy; the functional wrapper names are kept identical here so that rebase is near-mechanical.

Phases

Registry[T] core + the eight registries + UnknownCapability + optional description/schema
Lazy zagg.plugins entry-point discovery (re-entrancy guard + retry-on-lookup-failure)
describe_all() structured MCP surface + registry_snapshot() + per-kind functional wrappers
Tests (tests/test_registry.py)

How tested

uv run pytest tests/test_registry.py -q → 49 passed. Covers: the eight-registry invariant; direct + decorator registration round-trips across all eight; duplicate→ValueError unless replace=True; empty/non-string name→ValueError; UnknownCapability carries kind + available and is a KeyError; optional description/schema surfaced by describe/describe_all (and omitted when absent); structured + sorted describe_all; lazy discovery (entry-point register runs on first get, broken load()/register() skipped, entry_points() failure retries, re-entrancy, discover_plugins(force=True)); and the name→same-object invariant.
uv run ruff check --select=E,F,W,I --ignore=E501 src/zagg/registry.py tests/test_registry.py → clean.

Questions for review

Functional wrapper naming: kept list_spatial_funcs (etc.) to match Temporal aggregation infrastructure (Refs #12) #70's existing surface for a minimal rebase; the issue body wrote list_spatial_functions() illustratively. Flag if you'd prefer the longer form as the canonical name.
schema field type is left as Any (a pydantic model or a JSON-schema dict) until MCP Server #59 pins the MCP parameter encoding. Confirm that's the right level of looseness for now.

Authored by Claude for @espg under the #64 routine.

Generated by Claude Code

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01EN7X53ZAyaCca9rjwv9qZw

espg

🤖 from Claude (review)

Fresh-context adversarial review of the #64 registry seam. The design matches the locked spec (eight kinds, Registry[T] core, optional description/schema, UnknownCapability(name, kind, available=[...]), lazy zagg.plugins discovery with re-entrancy guard + retry, describe_all(), per-kind wrappers, strings-not-callables). Tests are well-structured and the runtime behavior is correct. Local verification:

pytest tests/test_registry.py -q → 49 passed ✅
ruff check --select=E,F,W,I --ignore=E501 → clean ✅ (matches the lint bot)
ruff format --check → fails (would reformat registry.py) ❌
mypy src/zagg/registry.py → 1 error in this file ❌ (list[...] annotations shadowed by the list() method)

CI (test.yml/lint.yml) runs only pytest + the ruff E,F,W,I bot, so GitHub checks will go green — but CLAUDE.md §4/§7 require pre-commit run --all-files (ruff-format + mypy + codespell) to pass locally, and two of those fail. Flagging as blocking-for-"done" rather than blocking-CI.

Blocking for local-green (§4):

registry.py:151 — mypy [valid-type]: the list() method shadows builtin list for the in-class list[...] annotations. (describe_all/list return annotations.)
registry.py — not ruff format-clean (3 spots; register_credential_provider signature, the logger.exception call, blank line after if obj is None:).

Non-blocking findings:
3. test_register_reentrancy_during_discovery doesn't actually re-enter discovery — it touches _entries directly instead of a get/__contains__ path, so the _DISCOVERING guard is uncovered (test would pass with the guard removed).
4. Lazy discovery isn't thread-safe — concurrent first-time get() under a ThreadPoolExecutor can double-run the sweep; either a stdlib threading.Lock or an explicit "pre-warm via discover_plugins(); lazy path isn't thread-safe" contract note.
5. __contains__-triggers-discovery is intentional but untested via the plugin path.
6. discover_plugins(force=True) resets only _DISCOVERED, not _entries, so a real forced re-sweep trips the dup guard unless every plugin uses replace=True (the test only passes because its fake does).
7. schema is stored/returned by reference (mutable-dict aliasing) — fine for now, worth a one-line docstring caveat.

On the PR's own "Questions for review": the Any looseness on schema reads reasonable as a placeholder until #59 pins the encoding; the list_spatial_funcs naming is an @espg call, not a correctness issue. No dependency or scope concerns — module is stdlib-only (~450 lines, well under the ~1000 limit), terse, and convention-matching. Items 1–2 are the only must-fix-before-done.

Generated by Claude Code

espg · 2026-06-22T19:09:58Z

+            raise UnknownCapability(name, self.kind, sorted(self._entries)) from None
+        return _entry_dict(entry, self.kind)
+
+    def describe_all(self) -> list[dict[str, Any]]:


🤖 from Claude (review)

Blocking (mypy / pre-commit): the method def list(self) (line 138) shadows the builtin list inside the class body, so subsequent list[...] annotations resolve to Registry.list rather than the builtin. With from __future__ import annotations mypy resolves the string annotations in class scope and errors here:

src/zagg/registry.py:151: error: Function "zagg.registry.Registry.list" is not valid as a type [valid-type]

I reproduced it with uv run mypy src/zagg/registry.py (mypy 1.19.1, the pinned pre-commit version). CI's test.yml/lint.yml don't run mypy, so the GitHub checks stay green — but CLAUDE.md §4/§7 require pre-commit run --all-files (ruff + mypy + codespell) to be green locally, and this fails it. Fixes: alias the builtin (_list = list at module top and annotate with _list[str]), or rename the method (names()), or wrap return types in quotes that mypy resolves outside class scope. Flagging rather than dictating since the public .list() name is part of the agreed surface.

Generated by Claude Code

espg · 2026-06-22T19:10:02Z

+    return CATALOG_SOURCES.list()
+
+
+def register_credential_provider(name, provider=None, *, description="", schema=None, replace=False):


🤖 from Claude (review)

Blocking (ruff format --check): this file is not ruff format-clean. uv run ruff format --check src/zagg/registry.py reports it would reformat, with three changes: a blank line after if obj is None: (line 113), collapsing the logger.exception(...) call (lines 226–228) onto one line, and wrapping this register_credential_provider signature across lines. Per CLAUDE.md §4, pre-commit run --all-files (which includes ruff-format) must be green. Run uv run ruff format src/zagg/registry.py. (The ruff check bot at E,F,W,I passes — it's only the formatter that flags this.)

Generated by Claude Code

espg · 2026-06-22T19:10:08Z

+        def plugin_register():
+            register_field_transform("a", object())
+            # Touch a get during discovery — should short-circuit, not recurse.
+            assert "a" in registry.FIELD_TRANSFORMS._entries


🤖 from Claude (review)

Test fidelity gap. This test is named/commented as exercising the re-entrancy guard ("Touch a get during discovery — should short-circuit, not recurse"), but the plugin body reaches into the raw dict registry.FIELD_TRANSFORMS._entries, which does not call _ensure_discovered(). So _DISCOVERING is never re-entered and the guard at registry.py:219 is not actually covered — the test would still pass if the guard were deleted. To genuinely exercise it, call a path that re-enters discovery from inside the plugin's register(), e.g. get_field_transform("a") or "a" in registry.FIELD_TRANSFORMS (the __contains__ path), and assert it returns rather than recursing/raising. Non-blocking, but it's the one safety property here whose test doesn't bite.

Generated by Claude Code

espg · 2026-06-22T19:10:15Z

+    seam permanently).
+    """
+    global _DISCOVERED, _DISCOVERING
+    if _DISCOVERED or _DISCOVERING:


🤖 from Claude (review)

Thread-safety (worth a one-line note, likely non-blocking). _DISCOVERED/_DISCOVERING are plain module globals mutated without a lock. The re-entrancy guard correctly handles same-thread recursion, but if two threads hit a first-time get()/list() concurrently (zagg runs under a ThreadPoolExecutor in places), both can observe _DISCOVERED==False and _DISCOVERING==False, both set _DISCOVERING=True, and both run the sweep — invoking each plugin's register() twice and tripping the duplicate-name ValueError inside the loop (caught + logged, so plugins half-register silently). A single module-level threading.Lock around the sweep (stdlib, no new dep) would make it once-per-interpreter under concurrency too. If the contract is "discovery always happens at single-threaded import via discover_plugins()", a one-line docstring note saying lazy discovery is not thread-safe and the Lambda path must pre-warm would suffice. Either way, flag the intended contract.

Generated by Claude Code

espg · 2026-06-22T19:10:19Z

+
+    def __contains__(self, name: object) -> bool:
+        _ensure_discovered()
+        return name in self._entries


🤖 from Claude (review)

Nit (coverage gap, not a bug). __contains__ triggers _ensure_discovered() — a deliberate and reasonable choice (membership is a lookup, so it should see plugin entries) — but no test asserts that x in REGISTRY actually drives discovery. test_register_direct_then_get exercises in only after a direct registration. Consider one assertion that "from_plugin" in registry.SPATIAL_FUNCS returns True pre-get via the entry-point path, mirroring test_entry_point_register_runs_on_first_get. This would also be the natural place to land the re-entrancy fix from the other comment.

Generated by Claude Code

espg · 2026-06-22T19:10:30Z

+        _DISCOVERING = False
+
+
+def discover_plugins(*, force: bool = False) -> None:


🤖 from Claude (review)

Sharp edge in force=True (doc'd, but a footgun). force=True only resets _DISCOVERED, not the entries already registered, so a real re-sweep re-runs every plugin register() against a non-empty registry and trips the duplicate-name guard unless every plugin passes replace=True. The test test_discover_plugins_force_reruns only passes because its fake plugin uses replace=True. A built-in plugin that doesn't would raise on the forced sweep (caught + logged inside the loop, so the re-discovery silently no-ops for that plugin). Since force is documented as "intended for tests," consider either making it test-only/private or having it also clear _entries so a forced re-sweep starts clean and is deterministic. Non-blocking; flag the intent.

Generated by Claude Code

espg · 2026-06-22T19:10:35Z

+    name: str
+    obj: T
+    description: str = ""
+    schema: Any | None = None


🤖 from Claude (review)

Minor (mutability of schema). Entry is frozen=True (good — the dataclass can't be rebound), but schema holds an arbitrary object and describe()/describe_all() return it by reference (_entry_dict does out["schema"] = entry.schema, and test_describe_with_schema_includes_it asserts is). If a registrant passes a mutable dict schema, an MCP consumer can mutate the registry's stored schema in place. Probably fine for now (schemas are expected to be static pydantic models / literals), but worth a one-line note in the schema docstring that it's stored and returned by reference and should be treated as immutable. Not blocking.

Generated by Claude Code

…ormat (#64)

espg · 2026-06-22T19:24:24Z

🤖 from Claude

Two things — folding the self-review, and the red morton tests.

Self-review findings — addressed in `a3b0ef1`

mypy error (registry.py, list() shadowing the builtin in annotations) → renamed the method list() → names(); the public list_* functional wrappers are unchanged. mypy src/zagg/registry.py is clean now.
ruff format → ran it on both new files; ruff format --check passes.
Thread-safety of the discovery flags → added a module-level threading.RLock + double-checked locking in _ensure_discovered (the seam runs under the Lambda ThreadPoolExecutor, so two first-time get()s could double-run the sweep). RLock keeps the same-thread plugin re-entrancy deadlock-free; the _DISCOVERED fast-path still short-circuits a plugin's register() calling back through get().
Test fidelity (re-entrancy test) → test_register_reentrancy_during_discovery now drives the guard through real __contains__ + names() paths (both call _ensure_discovered) instead of poking _entries directly.
schema stored by reference → added a one-line caveat in register()'s docstring (stored/returned as-is; pass an owned/immutable object).
Standing (non-blocking): discover_plugins(force=True) resets only the discovered flag — its docstring already documents that a forced re-sweep needs replace=True per plugin; that's the intended test-hook contract, left as-is.

Green after the fold: ruff check ✓, ruff format --check ✓, mypy registry.py ✓, 49 tests ✓.

The red morton tests are pre-existing, not from this PR

This PR's diff is registry.py + test_registry.py only (git diff --name-only origin/main... confirms) — it never touches the morton/HEALPix write path.
tests/test_processing.py::TestWriteDataframeToZarr::test_write_dataframe_to_zarr (+ the two integration cases) fail identically on a clean main. And it's not a morton-vs-nested scheme mismatch — it's the same morton values read signed vs unsigned: ACTUAL[-5734771175502905336] as uint64 is exactly DESIRED[12711972898206646280], and the reported max abs difference is exactly 2^64. So it's the morton: int64 sign-bit hazard (negative for HEALPix base cells 7–11) — the one Adopt mortie v1 MortonIndexDtype for the morton coordinate (keep NESTED cell_ids) #71 fixes by moving the coordinate to MortonIndexDtype.
Per CLAUDE.md §4, flagging not fixing — out of scope for the registry, resolved under Adopt mortie v1 MortonIndexDtype for the morton coordinate (keep NESTED cell_ids) #71.

So #73's own surface is green; the red X is the standing morton sign issue tracked by #71.

Generated by Claude Code

registry capability vocabulary with optional per-entry schema (#64)

61b3f1e

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01EN7X53ZAyaCca9rjwv9qZw

espg added the implement label Jun 22, 2026

espg commented Jun 22, 2026

View reviewed changes

fold #73 review: thread-safe lazy discovery, drop list-shadow, ruff-f…

a3b0ef1

…ormat (#64)

espg marked this pull request as ready for review June 22, 2026 19:31

espg merged commit 9f7c1d4 into main Jun 22, 2026
7 of 9 checks passed

espg pushed a commit that referenced this pull request Jun 22, 2026

merge main: reconcile onto #73 registry + #76 dispatch (#12)

fbfd865

This was referenced Jun 22, 2026

Temporal aggregation infrastructure (Refs #12) #70

Draft

Backend support (i.e., local processing support) #20

Open

Claude Summary Tracking (June 2026) #50

Open

Follow up items for the week of June 23, 2026 #109

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Capability registry — shared name→object vocabulary (#64)#73

Capability registry — shared name→object vocabulary (#64)#73
espg merged 2 commits into
mainfrom
claude/64-registry

espg commented Jun 22, 2026

Uh oh!

espg left a comment

Uh oh!

espg Jun 22, 2026

Uh oh!

espg Jun 22, 2026

Uh oh!

espg Jun 22, 2026

Uh oh!

espg Jun 22, 2026

Uh oh!

espg Jun 22, 2026

Uh oh!

espg Jun 22, 2026

Uh oh!

espg Jun 22, 2026

Uh oh!

espg commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return CATALOG_SOURCES.list()


		def register_credential_provider(name, provider=None, *, description="", schema=None, replace=False):

		_DISCOVERING = False


		def discover_plugins(*, force: bool = False) -> None:

Uh oh!

Conversation

espg commented Jun 22, 2026

What / approach

Relationship to #70

Phases

How tested

Questions for review

Uh oh!

espg left a comment

Choose a reason for hiding this comment

Uh oh!

espg Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

espg Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

espg Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

espg Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

espg Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

espg Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

espg Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

espg commented Jun 22, 2026

Self-review findings — addressed in a3b0ef1

The red morton tests are pre-existing, not from this PR

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Self-review findings — addressed in `a3b0ef1`