Skip to content

Site Health: Cache the full results and a timestamp in the status transient#12181

Open
gziolo wants to merge 11 commits into
WordPress:trunkfrom
gziolo:fix/65232-cache-site-health-results
Open

Site Health: Cache the full results and a timestamp in the status transient#12181
gziolo wants to merge 11 commits into
WordPress:trunkfrom
gziolo:fix/65232-cache-site-health-results

Conversation

@gziolo

@gziolo gziolo commented Jun 15, 2026

Copy link
Copy Markdown
Member

Trac ticket: https://core.trac.wordpress.org/ticket/65232

Note: The approach in this PR changed substantially based on review. It now keeps the existing counts transient lightweight and stores the full results in a separate, non-autoloaded cache. Earlier commits/discussion reflect the previous single-transient approach.

What & why

Trac #65232 wants a fast, cache-only Site Health summary for core/get-environment-info that never runs synchronous tests. Today the health-check-site-status-result transient stores only the aggregate good/recommended/critical counts, so any consumer that needs the underlying results has to re-run the tests.

As discussed on the ticket, the first step is to cache the full results (so a consumer — and later Trac #64066's page-cache detail — can reuse them) along with a timestamp so staleness can be judged. This PR delivers that caching foundation only. The Abilities site_health field and fields input are a follow-up that reads this cache.

Design

Two caches, with clearly separated jobs:

  • health-check-site-status-result — unchanged shape: only { good, recommended, critical }. Small, still autoloaded, backward-compatible for the admin-menu counter and Dashboard widget.
  • health-check-site-status-detail (new) — the full per-test results, keyed by test, each with its own timestamp. Stored with an expiration so it is not autoloaded (the larger payload must not load on every request). get_site_status_detail() also returns counts derived from those same results, so a consumer reads one internally consistent view.

Who writes what

  • The Site Health screen is authoritative for the detailed results, because it includes the asynchronous tests that require JavaScript to run. It posts the aggregate counts and the full results as two independent requests, so a large results payload can't block the lightweight counts refresh.
  • The weekly scheduled check is a backstop. It refreshes the counts transient and updates the detail cache, but only for entries that are missing or older than a week, so it won't discard fresher results collected from the screen. Per-result timestamps let stale entries (e.g. a test whose plugin was deactivated) be dropped after a month. An asynchronous test that is unavailable during the scheduled check is cached under its own identifier, so the counts and detailed results stay consistent.

Detailed cache shape

{
  "results": {
    "<test>": { "test": "", "label": "", "status": "good|recommended|critical",
                "badge": { "label": "", "color": "" },
                "description": "", "actions": "", "timestamp": 1715714399 }
  },
  "counts": { "good": 12, "recommended": 3, "critical": 0 },
  "timestamp": 1715714399
}

Canonical accessors (on WP_Site_Health)

  • get_site_status_counts() / set_site_status_counts() — the legacy UI counts (the small autoloaded transient).
  • get_site_status_detail(){ results, counts, timestamp }, internally consistent.
  • update_site_status_detail( $results, $authoritative ) — sanitizes (wp_kses_post + field sanitizers) and merges into the detail cache.

Backward compatibility

The counts transient keeps its existing shape, so menu.php, the Dashboard widget, and the localized SiteHealth data are unchanged. No WP_Site_Health load is forced on the admin-menu path.

Open questions for reviewers

  1. Merge policy. The scheduled check currently uses last-writer-wins with a one-week grace window so it won't clobber fresher screen results. Should browser-collected results instead strictly outrank cron (via a source field)?
  2. Detail-cache size. There is no hard cap on the detail payload. On sites with many plugin-registered tests it could exceed an object cache's per-item limit (e.g. Memcached's 1 MB), in which case the detail simply isn't cached (the counts hot-path is unaffected). Worth a size cap/trim, or acceptable as-is?

Testing

  • tests/phpunit/tests/admin/wpSiteHealth.php — scheduled-check counts vs. detail split, the fresh-preserve / stale-refresh / stale-drop behavior, derived detail counts, and the unavailable-async fallback being cached under its test identifier.
  • tests/phpunit/tests/ajax/wpAjaxHealthCheckSiteStatusResult.php — independent counts-only and results-only requests, HTML sanitization, capability enforcement.

phpcs, PHPStan, and the full site-health and ajax test groups pass locally.

Use of AI Tools

AI assistance: Yes
Tool(s): Claude Code (Claude Opus 4.8), Codex
Used for: Development and validation; reviewed and verified by me.

🤖 Generated with Claude Code

…nsient.

Previously the `health-check-site-status-result` transient stored only the
aggregate `good`, `recommended`, and `critical` counts, so any consumer that
needed the underlying results had to re-run the Site Health tests synchronously.

The scheduled check now caches the complete results array and the time they were
collected alongside the counts. The Site Health screen AJAX handler validates the
submitted counts and preserves the cached results and timestamp while refreshing
the counts, and `enqueue_scripts()` only localizes the counts to avoid embedding
the full results in every Site Health and Dashboard page.

This provides a reusable cached source of detailed Site Health data for the
dashboard, the admin menu, and future REST/Abilities consumers without triggering
synchronous tests.

Adds unit tests covering the scheduled-check caching and the AJAX result handler.

See #65232.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props gziolo, westonruter.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@github-actions

Copy link
Copy Markdown

Hi there! 👋

Thank you for your contribution to WordPress! 💖

It looks like this is your first pull request to wordpress-develop. Here are a few things to be aware of that may help you out!

No one monitors this repository for new pull requests. Pull requests must be attached to a Trac ticket to be considered for inclusion in WordPress Core. To attach a pull request to a Trac ticket, please include the ticket's full URL in your pull request description.

Pull requests are never merged on GitHub. The WordPress codebase continues to be managed through the SVN repository that this GitHub repository mirrors. Please feel free to open pull requests to work on any contribution you are making.

More information about how GitHub pull requests can be used to contribute to WordPress can be found in the Core Handbook.

Please include automated tests. Including tests in your pull request is one way to help your patch be considered faster. To learn about WordPress' test suites, visit the Automated Testing page in the handbook.

If you have not had a chance, please review the Contribute with Code page in the WordPress Core Handbook.

The Developer Hub also documents the various coding standards that are followed:

Thank you,
The WordPress Project

@gziolo gziolo requested a review from westonruter June 15, 2026 14:10
@gziolo gziolo self-assigned this Jun 15, 2026
@github-actions

Copy link
Copy Markdown

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

Comment thread src/wp-admin/includes/ajax-actions.php Outdated
Comment on lines +5483 to +5501
/*
* Only the aggregate counts are refreshed here. Preserve the full test results
* and the timestamp recorded by the scheduled check so they are not discarded
* when the counts are updated from the Site Health screen.
*/
$cached = get_transient( 'health-check-site-status-result' );
$cached = is_string( $cached ) ? json_decode( $cached, true ) : null;

if ( is_array( $cached ) ) {
if ( isset( $cached['results'] ) && is_array( $cached['results'] ) ) {
$site_status['results'] = $cached['results'];
}

if ( isset( $cached['timestamp'] ) ) {
$site_status['timestamp'] = (int) $cached['timestamp'];
}
}

set_transient( 'health-check-site-status-result', wp_json_encode( $site_status ) );

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't the results here superior to the results obtained via wp_cron_scheduled_check()? They include the results for the async tests which require JS to run and thus cannot be computed via WP Cron. Should the logic here be instead moved to wp_cron_scheduled_check() so that async results are preserved after WP Cron runs?

Aside: we may want to add the timestamp as a key inside each result. This would allow us to drop test results which are overly stale (e.g. if a plugin with a test was deactivated).

@westonruter westonruter left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to my comment about letting the ajax-stored result be preferred, here's a critical observation that Claude made: the lack of an $expiration means that the transient will be autoloaded.

Since it now includes all of the results which could potentially be large, this would run the risk of increasing the bucket of autoloaded options beyond the 1MB limit imposed by Memcached, right?

Full review from Claude

Confirmed the critical detail: set_transient() is called with no expiration ($expiration = 0), so without an external object cache the option is stored with $autoload = true (option.php:1547-1552). The full results blob is therefore autoloaded on every page load site-wide, not just on Site Health screens. That makes the size concern much more severe than the finders estimated.

I have enough verified context. Here are the findings, ranked most-severe first.

Code Review: PR #12181 — Cache Site Health full results + timestamp

Findings

1. Full results blob is autoloaded on every page request (perf regression)class-wp-site-health.php:3492
set_transient( 'health-check-site-status-result', wp_json_encode( $site_status ) ) is called with no expiration. With no external object cache, set_transient() stores the option with autoload = true (option.php:1547-1552). Previously the value was ~60 bytes (three ints); now it's the entire $results array including every test's HTML description, actions, and badge. That payload is now loaded into memory on every front-end and admin request for every visitor, via wp_load_alloptions() — a site-wide cost paid to store data only the Site Health screen reads (and only the 3 counts from it). Passing an expiration, or storing under a non-autoloaded option, would avoid this.

2. Large results payload can exceed object-cache item limits → counter silently reverts to 0class-wp-site-health.php:3489
On installs with a persistent object cache (e.g. Memcached, default 1MB item limit), a site with many site_status_tests (plugins add them, with arbitrary HTML) can produce a JSON blob exceeding the backend's per-item limit. wp_cache_set() then silently fails, get_transient() returns false, and the admin-menu/dashboard counters fall back to 0/0/0 even though valid counts existed. No size cap or truncation is applied.

3. Race between scheduled cron and the AJAX counts update can discard freshly collected resultssrc/wp-admin/includes/ajax-actions.php:5488
The AJAX handler does a non-atomic read-modify-write: it reads the cached transient, splices in the POSTed counts, and re-writes. If wp_cron_scheduled_check() writes fresh results between the handler's get_transient() and set_transient() (admin loads Site Health while cron runs), the handler overwrites with the older (or absent) results it read earlier — silently dropping the detailed results the ticket exists to preserve, until the next weekly cron.

4. New status guard miscounts unknown/empty statuses and diverges from results countclass-wp-site-health.php:3471
The guard if ( ! is_array( $result ) || ! isset( $result['status'] ) ) continue; only checks that status is set. A result with status => '' or a custom value like 'info' still falls through to the final else and increments good, mislabeling it as a passing check. Separately, a result missing status was previously counted as good (old loop had no guard) but is now skipped entirely — so the cached results array length (e.g. 3) can no longer equal good + recommended + critical, which will surprise any consumer that assumes they match.

5. timestamp freshness mechanism is half-builtclass-wp-site-health.php:3490
The comment says the timestamp lets freshness "be evaluated," but: the transient is set with no expiry, nothing in core ever compares the timestamp to now, and the AJAX handler preserves the old cron timestamp on every counts update (ajax-actions.php:5495-5497). So the stored timestamp reflects the last full cron run, not the last update — any future freshness check built on it would be subtly wrong, and the promised mechanism doesn't actually exist yet.

6. Payload schema duplicated across writers/readers with no canonical accessorsrc/wp-admin/includes/ajax-actions.php:5477
The count-normalization array( 'good' => (int) ( $x['good'] ?? 0 ), 'recommended' => …, 'critical' => … ) is copied verbatim into ajax-actions.php:5477-5481 and class-wp-site-health.php:126-130, and the get_transient() + json_decode( …, true ) idiom is duplicated too. The transient's shape is now defined by convention across five sites (cron writer, ajax merge, enqueue reader, menu.php, dashboard.php). Adding a status bucket or renaming a key requires editing several hand-maintained literals; missing one yields counts that silently disagree between the menu counter, dashboard widget, and Site Health screen. A single read/write/normalize helper would centralize it.

7. Enqueue decodes the entire results blob just to read three integersclass-wp-site-health.php:117
On every Site Health and Dashboard page render, json_decode() parses the full cached payload (all HTML results) only to extract good/recommended/critical — the comment itself notes "only the aggregate counts are needed." The decode cost scales with the size/number of test results, for no benefit. (Mitigated if findings 1 & 2 are addressed by not storing results in the same hot transient.)

8. AJAX handler re-encodes the full results blob on every counts POSTsrc/wp-admin/includes/ajax-actions.php:5501
Each time an admin loads the Site Health Status tab, the JS posts the counts and this handler reads, decodes, and re-wp_json_encode()s the entire results payload back into the transient — repeated large serialization writes per page view, not just per weekly cron run (write amplification).


Note on a dropped candidate: finders flagged wp_json_encode( $site_status ) returning false (and storing false) on invalid UTF-8 in result HTML. I refuted it — wp_json_encode() runs _wp_json_sanity_check()/_wp_json_convert_string() to repair invalid UTF-8 and re-encode (functions.php:4411-4417), so the common case recovers; only a >512-depth structure would fail, which Site Health results never reach.

Findings 1–4 are the ones I'd block on; finding 1 in particular looks like a meaningful site-wide performance regression worth raising on the PR. Want me to post these as inline PR comments (/code-review --comment) or open a discussion on the specific lines?

@gziolo

gziolo commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

Thanks for the review! Reworked this substantially in 5bb9fee6a9 and updated the PR description. Key change: counts stay in the small autoloaded transient, full results move to a separate non-autoloaded detail cache — which also clears the race and write-amplification points. The Site Health screen is now authoritative for the detailed results (incl. async tests) with cron as a backstop, plus per-result timestamps.

Two design calls I'd value your take on before going further: the merge policy (1-week grace window vs. browser strictly outranking cron) and whether to cap the detail-cache size. Both noted in the description.

Disclosure: I used Claude Code and Codex to develop and validate these changes (reviewed and verified by me). 🙏

…ansient.

Following review feedback, store only the aggregate counts in the autoloaded
`health-check-site-status-result` transient and cache the full per-test results
separately in `health-check-site-status-detail`, which has an expiration so the
larger payload is not autoloaded on every request.

The Site Health screen submits the full results — including the asynchronous
tests that require JavaScript to run — and they are sanitized and cached as the
authoritative detailed results. The scheduled check refreshes only missing or
stale entries so it does not discard fresher results collected from the screen.
Each result carries its own timestamp, and entries that have not been refreshed
within a month are dropped.

Adds WP_Site_Health::get_site_status_counts(), get_site_status_detail(), and
merge_site_status_detail() as the canonical accessors, and updates the unit and
Ajax tests accordingly.

See #65232.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gziolo gziolo force-pushed the fix/65232-cache-site-health-results branch from 5bb9fee to 70b0566 Compare June 16, 2026 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants