fix(analysis): revert _clone_collector to copy.deepcopy (#293) by allmonday · Pull Request #299 · KLR-Pattern/pydantic-resolve

allmonday · 2026-06-25T00:38:04Z

Resolves #293.

Summary

bfba588 (2026-06-09) replaced copy.deepcopy with a hand-written _clone_collector that bypassed __init__ via cls.__new__(cls) and copied only alias/flat/val. This was a perf optimization, but it silently broke every collector shape beyond the default list-based Collector:

Direct ICollector implementations lost any attribute set in __init__ (e.g. key_fn) → AttributeError on first add().
Collector subclasses with extra __init__ config (e.g. n, window_size) lost it.
Collectors whose val is a dict/set/custom aggregator had their val hard-coded back to [] by the framework.

This PR reverts to copy.deepcopy. Transparent to users — any collector implementation works without modification.

Change

+ import copy
  ...
  def _clone_collector(collector):
-     cls = collector.__class__
-     new = cls.__new__(cls)
-     new.alias = collector.alias
-     if hasattr(collector, 'flat'):
-         new.flat = collector.flat
-     new.val = []
-     return new
+     return copy.deepcopy(collector)

Perf

Concern: how much does this give back from bfba588?

Micro-benchmark (Collector('alias') proto, 100k iterations):

	per call	ratio
`copy.deepcopy`	2.89 us	22x slower
hand-written `_clone_collector`	0.13 us	baseline

The 22x ratio is real but the absolute cost is tiny — typical Collector proto has 3 simple attributes (alias: str, flat: bool, val: list).

End-to-end benchmarks (pytest benchmarks/):

15 scenarios ran on both versions. Differences were within ±10% and bidirectional — pure noise (each benchmark's own StdDev is 30-100% of mean). No measurable regression. bfba588's advertised 10-14% gains came from the other optimizations in that commit (object.__setattr__, isinstance fast path, metadata caching), not from the collector clone change.

Realistic scale (worst-case estimate):

Clone count per resolve	Added overhead
10	0.03 ms
200	0.55 ms
1 000	2.75 ms
10 000	27.5 ms

For context: a single SQL query is 1–10 ms. Collectors live in post_* methods, which are typically IO-bound (DataLoader batches) — the deepcopy overhead is invisible.

Test plan

tests/resolver/test_collector_subclass.py (added in prior commit on this branch) — 5 scenarios. Pre-fix: 1/5 pass. Post-fix: 5/5 pass.
- MapCollector (implements ICollector, dict val, key_fn config)
- Sibling branch isolation (two Post nodes under one Root)
- Sequential resolve() on same Resolver
- TopNCollector (Collector subclass with n config)
- SimpleSubCollector (backward-compat baseline — must not regress)
Full suite: 786 passed, 1 skipped.
ruff check clean.

🤖 Generated with Claude Code

Adds tests/resolver/test_collector_subclass.py with five scenarios covering the ICollector / Collector subclassing surface. On current master, 4 of 5 fail with AttributeError pointing at the clone mechanism: - test_map_collector_dedupes MapCollector implements ICollector directly, val is dict, config is key_fn. Fails: 'MapCollector' object has no attribute 'key_fn'. - test_sibling_branches_isolated Two Post nodes under one Root, each should see only its own comments. Same root cause as above (clone drops key_fn). - test_sequential_resolve_no_leak Resolver reused across two trees; second tree must not see the first tree's state. Same root cause. - test_topn_collector_preserves_n_config TopNCollector(Collector) adds `n` config in __init__. Fails: 'TopNCollector' object has no attribute 'n'. - test_simple_subcollector_still_works Backward-compat baseline: SubCollector that only overrides add() (matches tests/resolver/test_35_collector.py:8). PASSES today and MUST keep passing after the fix. No production code changes. This commit only locks in the bug surface so we can agree on the fix before implementing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

bfba588 replaced copy.deepcopy with a hand-written _clone_collector that bypassed __init__ via cls.__new__(cls) and copied only alias/flat/ val. This was a perf optimization, but it silently broke every collector shape beyond the default list-based Collector: - Direct ICollector implementations lost any attribute set in __init__ (e.g. key_fn) — AttributeError on first add(). - Collector subclasses with extra __init__ config (e.g. n) lost it. - Collectors whose val is a dict/set/custom aggregator had their val hard-coded back to [] by the framework. Revert to copy.deepcopy. This is transparent to users — any collector implementation works without modification. Perf: micro-benchmark shows 2.89 us/call (deepcopy) vs 0.13 us/call (hand-written) on a typical proto, a 22x ratio but absolute cost is dwarfed by IO in any realistic post_* workload. End-to-end benchmarks show 0% impact — no benchmark in benchmarks/ exercises collectors, and the 10-14% gains advertised in bfba588 came from the other optimizations in that commit (object.__setattr__, isinstance fast path, metadata caching), not from the collector clone change. Tests: tests/resolver/test_collector_subclass.py (added in prior commit on this branch) covers 5 scenarios. Pre-fix: 1/5 pass. Post-fix: 5/5. - MapCollector (ICollector, dict val, key_fn config) - Sibling branch isolation - Sequential resolve() on same Resolver - TopNCollector (Collector subclass with n config) - SimpleSubCollector (backward-compat baseline) Full suite: 786 passed, 1 skipped. Ruff clean. Resolves #293. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

allmonday and others added 2 commits June 25, 2026 08:06

allmonday merged commit d902480 into master Jun 25, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(analysis): revert _clone_collector to copy.deepcopy (#293)#299

fix(analysis): revert _clone_collector to copy.deepcopy (#293)#299
allmonday merged 2 commits into
masterfrom
fix/293-collector-clone-deepcopy

allmonday commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

allmonday commented Jun 25, 2026

Summary

Change

Perf

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant