[BUGFIX] Flush runtime cache periodically in LinkAnalyzer#484
Open
mikelwohlschlegel wants to merge 2 commits into
Open
[BUGFIX] Flush runtime cache periodically in LinkAnalyzer#484mikelwohlschlegel wants to merge 2 commits into
mikelwohlschlegel wants to merge 2 commits into
Conversation
badaad0 to
8b47bf1
Compare
3 tasks
The brofix:checklinks CLI command runs into PHP fatal "Allowed memory size exhausted" errors on large site trees because TYPO3's runtime cache (TransientMemoryBackend) accumulates entries that are never freed: - backendUtilityBeGetRootLine: rootlines for every visited page - backendUtilityPageForRootLine: full page records for rootline resolution - pageTsConfig-hash-to-object-*: parsed PageTSconfig objects - backendUtilityTscPidCached: real-PID cache entries These are populated transitively by BackendUtility::getRecord(), BEgetRootLine() and getPagesTSconfig() inside isRecordsOnPageShouldBeChecked() and the LinkParser pipeline. Flushing only between array_chunk() iterations of page IDs is not sufficient on installations where one chunk fits the whole site tree (e.g. ~18k pages with the 32k bind-parameter limit on MariaDB). Flush every 1000 processed records inside the inner loop so memory growth is bounded regardless of chunk size, plus a final flush per chunk for any remaining entries. The flush interval N=1000 was empirically tuned against an 18'320 pages / 90'110 tt_content install at a 450M memory_limit. Surprisingly, smaller N (100, 200, 400) all plateaued at ~461-468 MB RSS, while N=1000 plateaued at ~382 MB and N=2000 at ~439 MB. Smaller intervals appear to produce heap fragmentation through frequent allocate/free cycles of the runtime cache structures, while too large intervals let the cache grow back. N=1000 hit the sweet spot. Without the per-record flush at all, RSS grew unbounded past 1066 MB at a 1024M memory_limit. The runtime cache is request-scoped and rebuilt on demand, so flushing has no functional impact. CacheManager is injected as a nullable constructor argument with a GeneralUtility::makeInstance() fallback to keep the public API backwards compatible.
8b47bf1 to
e97748e
Compare
Owner
|
Thanks @mikelwohlschlegel for your work. That looks pretty awesome. I am looking at it. Looks pretty straightforward. I also created an issue to track memory optimizations #485 I resolved the conflicts. (You might have based your PR on an older version, please be sure to update your fork). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does
Bounds the memory footprint of
brofix:checklinkson long CLI runs byflushing TYPO3's runtime cache every 1000 processed records inside the
inner loop of
LinkAnalyzer::generateBrokenLinkRecords(), plus a finalflush at the end of each chunk.
CacheManageris added as a nullable constructor argument with aGeneralUtility::makeInstance()fallback to keep the public APIbackwards compatible.
Why
Running
brofix:checklinksover a large site tree from CLI causes aPHP fatal "Allowed memory size exhausted" before the run finishes:
The OOM site varies depending on cache backend configuration; it's just
the allocation that crosses the limit. The actual buildup is in TYPO3's
runtime cache (
cache_runtime,TransientMemoryBackend).Where the memory comes from
Inside
LinkAnalyzer::generateBrokenLinkRecords(), the innerwhile ($row = $result->fetchAssociative())loop callsisRecordsOnPageShouldBeChecked()and the link-parser pipeline, whichtransitively invoke:
BackendUtility::getRecord()BackendUtility::BEgetRootLine()BackendUtility::getPagesTSconfig()Each populates entries in TYPO3's runtime cache:
backendUtilityBeGetRootLine— rootlines for every visited pagebackendUtilityPageForRootLine— full page records for rootline resolutionpageTsConfig-hash-to-object-*— parsed PageTSconfig objectsbackendUtilityTscPidCached— real-PID cache entriesThe runtime cache uses
TransientMemoryBackend(a plain PHP array) andis never freed inside the process. In a web request that's fine; in a
CLI run that walks the whole site tree, it grows without bound.
Why per-chunk cleanup isn't enough
array_chunk($this->pids, $max)with$max = getMaxBindParameters() / 2 - 4yields ≈ 32'763 on MariaDB/MySQL — so all page IDs of a typical site
fit into a single chunk. Any "between chunks" cleanup effectively runs
only at the very end of each table, by which point hundreds of MB of
runtime cache entries have already accumulated.
How
The runtime cache is request-scoped and rebuilt on demand, so flushing
has no functional impact — the next
getRecord()/getRootLine()call just re-queries (or rebuilds from the regular page caches).
Memory diagnostics
Tested against an 18'320 pages / 90'110 tt_content install with
TYPO3 v13.4, PHP 8.3, MariaDB 10.11.
The unpatched run's RSS grows unbounded; the patched run plateaus
quickly and stays flat through the rest of the workload. RSS profile
(N=1000, 450M cap):
Why N=1000 (counterintuitive empirical finding)
We benchmarked several values for N against the same install at a
450M memory_limit, all with a warm brofix link target cache so HTTP
work was constant:
The intuitive expectation is "smaller N = lower peak", but the data
shows otherwise. Smaller intervals (≤400) all plateau at the same
~465 MB level, while N=1000 plateaus ~80 MB lower. The likely cause is
heap fragmentation: very frequent
flush()calls churn throughallocate/free cycles on the runtime cache's internal arrays, leaving
fragments PHP cannot return to the OS. Larger N produces fewer such
cycles and lets PHP keep memory more compact. Going further to
N=2000 lets the cache itself grow back between flushes, raising the
peak again. N=1000 is the empirical sweet spot.
Trade-offs and alternatives considered
on memory and slightly worse fragmentation; see table above.
LIMIT/OFFSETpagination of the inner query instead ofstreaming + flush: adds query overhead (deep OFFSETs are slow on
MariaDB) and requires
QueryBuildercloning. The streaming + counterapproach is simpler and has no query-cost regression.
configured TYPO3 sites; does nothing on single-site installs because
the one subprocess still has to walk the whole page tree.
Compatibility
(
CacheManageris the new nullable trailing argument with amakeInstance()fallback).what's held in PHP memory, and the runtime cache rebuilds on demand.
cache_runtimevia the same API.