Skip to content

Memory leak: ~70KB retained per mapped object when iterating large collections (RawStamper × Valinor's Reflection::function() cache) #7

@loevgaard

Description

@loevgaard

Summary

Long-running processes (e.g. Symfony Messenger workers) that map many objects through the SDK leak roughly 70KB per mapped object, eventually hitting the PHP memory_limit. A worker with the default 128M limit OOMs after roughly 1,300–1,800 mapped customers in a single paginate() walk.

The leak is triggered by the SDK's RawStamper converter registered on the default MapperBuilder — but the root cause is an upstream Valinor bug: CuyZ/Valinor#800 (CuyZ/Valinor#800).

Mechanism

Valinor's Reflection::function() memoizes ReflectionFunction instances in a static array keyed by spl_object_hash of a closure it creates itself:

// CuyZ\Valinor\Utility\Reflection\Reflection (Valinor 2.4.0)
$closure = Closure::fromCallable($function);
return self::$functionReflection[spl_object_hash($closure)] ??= new ReflectionFunction($closure);

Closure::fromCallable() creates a fresh closure object on every call, so the key never repeats — and because the cached ReflectionFunction retains the closure, the entries (and everything the closures capture) live forever. With any converter registered, ValueConverterNodeBuilder::unstack() goes through this path per converted value node, so every mapped Resource adds permanent entries to a static cache that GC cannot touch.

After mapping 5,000 customers (50 pages × 100):

CuyZ\Valinor\Utility\Reflection\Reflection::$functionReflection: 345,758 entries (~330M)

Reproduction (verified, Valinor 2.4.0 / SDK 1.0.0-alpha.3 / PHP 8.4)

Scripted PSR-18 client returning 50 pages of 100 customers; memory_get_usage(true) sampled per iteration:

Scenario Result
Client::get() only (transport + JSON decode, no Valinor) flat ~10M ✅
Same MapperBuilder settings without registerConverter(new RawStamper()), map() × 50 flat ~14M ✅
Default SDK builder (with RawStamper), shared TreeMapper, map() × 50 346M
Same, fresh mapper per page (what CollectionEndpoint::mapPage() does) 346M
With gc_collect_cycles() per iteration unchanged, 0 cycles collected (not a cycle leak)
After unset()ing the entire mapper graph + GC memory NOT released (static retention)

Notes from the bisect:

  • The leak magnitude is independent of whether the converter actually fires — an experiment that broke the converter's type match (so $raw was never stamped) leaked identically, because unstack() reflects the converter per node before the match check.
  • RawStamper's @template T generics are not the trigger; de-generifying changes nothing (and breaks stamping, since Valinor then refuses to match the converter's object return type against concrete node types).

Suggested fix

Stop registering the converter; stamp $raw in the endpoints after mapping, where both the decoded array and the mapped object are in hand:

  1. Remove ->registerConverter(new RawStamper()) from Client::defaultMapperBuilder() and delete RawStamper.
  2. ResourceEndpoint::getOne() / createOne() / updateOne(): after mapping, $resource->raw = $data;
  3. CollectionEndpoint::mapPage(): $page->raw = $data; plus foreach ($page->collection as $i => $item) { $item->raw = $data['collection'][$i]; } — Valinor maps lists in input order, so positional alignment holds.
  4. Regression test: map a 100-item collection 50× with the default builder and assert the size of Reflection::$functionReflection (read via reflection) is identical at iteration 10 and 50 — or simply assert bounded memory_get_usage() growth.

This removes the leak regardless of the upstream timeline (verified: the no-converter scenario is flat), and as a bonus drops the PHPStan purity suppression and the @template T of object parser workaround documented in RawStamper's docblock.

Even once CuyZ/Valinor#800 is fixed upstream (PR CuyZ/Valinor#801 is open but unmerged), the per-node converter reflection in unstack() remains avoidable overhead, so the endpoint-stamping approach is preferable either way.

Impact

Any consumer iterating large collections in a long-running process — paginate() over a few thousand records is enough. Request-scoped web usage never notices (the process dies before the leak matters), which is why this only surfaced in a queue worker doing an initial data import.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions