fix(analysis): revert _clone_collector to copy.deepcopy (#293)#299
Merged
Conversation
Adds tests/resolver/test_collector_subclass.py with five scenarios
covering the ICollector / Collector subclassing surface. On current
master, 4 of 5 fail with AttributeError pointing at the clone
mechanism:
- test_map_collector_dedupes
MapCollector implements ICollector directly, val is dict, config
is key_fn. Fails: 'MapCollector' object has no attribute 'key_fn'.
- test_sibling_branches_isolated
Two Post nodes under one Root, each should see only its own
comments. Same root cause as above (clone drops key_fn).
- test_sequential_resolve_no_leak
Resolver reused across two trees; second tree must not see the
first tree's state. Same root cause.
- test_topn_collector_preserves_n_config
TopNCollector(Collector) adds `n` config in __init__. Fails:
'TopNCollector' object has no attribute 'n'.
- test_simple_subcollector_still_works
Backward-compat baseline: SubCollector that only overrides add()
(matches tests/resolver/test_35_collector.py:8). PASSES today and
MUST keep passing after the fix.
No production code changes. This commit only locks in the bug surface
so we can agree on the fix before implementing.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
bfba588 replaced copy.deepcopy with a hand-written _clone_collector that bypassed __init__ via cls.__new__(cls) and copied only alias/flat/ val. This was a perf optimization, but it silently broke every collector shape beyond the default list-based Collector: - Direct ICollector implementations lost any attribute set in __init__ (e.g. key_fn) — AttributeError on first add(). - Collector subclasses with extra __init__ config (e.g. n) lost it. - Collectors whose val is a dict/set/custom aggregator had their val hard-coded back to [] by the framework. Revert to copy.deepcopy. This is transparent to users — any collector implementation works without modification. Perf: micro-benchmark shows 2.89 us/call (deepcopy) vs 0.13 us/call (hand-written) on a typical proto, a 22x ratio but absolute cost is dwarfed by IO in any realistic post_* workload. End-to-end benchmarks show 0% impact — no benchmark in benchmarks/ exercises collectors, and the 10-14% gains advertised in bfba588 came from the other optimizations in that commit (object.__setattr__, isinstance fast path, metadata caching), not from the collector clone change. Tests: tests/resolver/test_collector_subclass.py (added in prior commit on this branch) covers 5 scenarios. Pre-fix: 1/5 pass. Post-fix: 5/5. - MapCollector (ICollector, dict val, key_fn config) - Sibling branch isolation - Sequential resolve() on same Resolver - TopNCollector (Collector subclass with n config) - SimpleSubCollector (backward-compat baseline) Full suite: 786 passed, 1 skipped. Ruff clean. Resolves #293. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves #293.
Summary
bfba588(2026-06-09) replacedcopy.deepcopywith a hand-written_clone_collectorthat bypassed__init__viacls.__new__(cls)and copied onlyalias/flat/val. This was a perf optimization, but it silently broke every collector shape beyond the default list-basedCollector:ICollectorimplementations lost any attribute set in__init__(e.g.key_fn) →AttributeErroron firstadd().Collectorsubclasses with extra__init__config (e.g.n,window_size) lost it.valis a dict/set/custom aggregator had theirvalhard-coded back to[]by the framework.This PR reverts to
copy.deepcopy. Transparent to users — any collector implementation works without modification.Change
Perf
Concern: how much does this give back from
bfba588?Micro-benchmark (
Collector('alias')proto, 100k iterations):copy.deepcopy_clone_collectorThe 22x ratio is real but the absolute cost is tiny — typical Collector proto has 3 simple attributes (alias: str, flat: bool, val: list).
End-to-end benchmarks (
pytest benchmarks/):15 scenarios ran on both versions. Differences were within ±10% and bidirectional — pure noise (each benchmark's own StdDev is 30-100% of mean). No measurable regression.
bfba588's advertised 10-14% gains came from the other optimizations in that commit (object.__setattr__, isinstance fast path, metadata caching), not from the collector clone change.Realistic scale (worst-case estimate):
For context: a single SQL query is 1–10 ms. Collectors live in
post_*methods, which are typically IO-bound (DataLoader batches) — the deepcopy overhead is invisible.Test plan
tests/resolver/test_collector_subclass.py(added in prior commit on this branch) — 5 scenarios. Pre-fix: 1/5 pass. Post-fix: 5/5 pass.MapCollector(implements ICollector, dict val,key_fnconfig)resolve()on same ResolverTopNCollector(Collector subclass withnconfig)SimpleSubCollector(backward-compat baseline — must not regress)ruff checkclean.🤖 Generated with Claude Code