diff --git a/handoffs/legacy-utf8-divergence-report.md b/handoffs/legacy-utf8-divergence-report.md
new file mode 100644
index 0000000000000..e9519a449ec62
--- /dev/null
+++ b/handoffs/legacy-utf8-divergence-report.md
@@ -0,0 +1,401 @@
+# Legacy UTF-8 Helper Divergence Report
+
+## Scope
+
+This report covers the current `utf8-survey` checkout and the handoff in
+`encoding-fuzzer/handoffs/legacy-utf8-divergence-survey.md`.
+
+No production code was changed for this survey. The throwaway runner lived at
+`/private/tmp/legacy_utf8_divergence_survey.php` and reused the generator and
+oracle battery from the adjacent `encoding-fuzzer/tools/encoding-fuzz/`
+checkout. It loaded this checkout's `compat.php`, `compat-utf8.php`,
+`utf8.php`, and `formatting.php` with minimal stubs.
+
+The generated pass was deterministic: case `N` used
+`new EncodingFuzz\Prng( "legacy-utf8-divergence:N" )`. The exact command was:
+
+```sh
+php /private/tmp/legacy_utf8_divergence_survey.php 3000000 256 > /private/tmp/legacy_utf8_divergence_survey_results.json
+```
+
+For auditability, a cleaned-up copy of the throwaway runner was committed in
+`700d7c8c910f` (`Charset: Add legacy UTF-8 survey runner`) and removed in the
+follow-up commit after this report recorded that provenance.
+
+Important current-branch note: the handoff describes
+`wp_check_invalid_utf8()` as PCRE-based. That is historically correct, but this
+checkout already contains the 6.9-era rewrite from `d1e7f5625b`, so the current
+implementation now calls `wp_is_valid_utf8()` and `wp_scrub_utf8()` when
+`blog_charset` is UTF-8.
+
+## Environment
+
+- PHP: 8.4.21
+- Extensions present: `mbstring`, `intl`
+- PCRE Unicode support: yes
+- Generated pass: 3,000,000 inputs, 366,593,389 bytes, max generated input
+  size 256 bytes
+- Generator strategies: random bytes, random ASCII, valid UTF-8,
+  mutated-valid UTF-8, atom splices, latin1-ish text, UTF-16 bytes,
+  ASCII fast paths, repeated motifs
+- Oracle battery: `EncodingFuzz\Oracles::battery()`
+
+`wp_check_invalid_utf8()` caches the first `is_utf8_charset()` result in a
+static. To compare UTF-8 and non-UTF-8 charset behavior in one process, the
+runner used an equivalent uncached copy of the current implementation for that
+matrix. This models first-call behavior in a fresh request under each charset.
+
+## Aggregate Results
+
+| Measurement | Count |
+| --- | ---: |
+| Generated inputs | 3,000,000 |
+| Strict-valid inputs | 1,128,174 |
+| Strict-invalid inputs | 1,871,826 |
+| `seems_utf8()` accepted strict-invalid input | 92,007 |
+| `seems_utf8()` rejected strict-invalid input | 1,779,819 |
+| `seems_utf8()` rejected strict-valid input | 0 |
+| `wp_check_invalid_utf8( $s, false )` returned `''` for invalid UTF-8 under UTF-8 charset | 1,871,826 |
+| `wp_check_invalid_utf8( $s, true )` matched `wp_scrub_utf8( $s )` under UTF-8 charset | 1,871,826 |
+| `wp_check_invalid_utf8( $s, true )` mismatched `wp_scrub_utf8( $s )` under UTF-8 charset | 0 |
+| `wp_check_invalid_utf8()` passed invalid bytes through under ISO-8859-1 charset | 1,871,826 |
+
+`seems_utf8()` accepted strict-invalid inputs in exactly these buckets:
+
+| Divergence class | Generated examples |
+| --- | ---: |
+| UTF-16 surrogate sequence | 21,051 |
+| Code point above `U+10FFFF` | 14,770 |
+| Obsolete 5-byte sequence | 6,850 |
+| Obsolete 6-byte sequence | 6,764 |
+| Overlong 2-byte sequence | 14,686 |
+| Overlong 3-byte sequence | 13,775 |
+| Overlong 4-byte sequence | 14,111 |
+
+No generated class showed `wp_check_invalid_utf8( $s, true )` diverging from
+`wp_scrub_utf8( $s )` when `blog_charset` was UTF-8.
+
+## Divergence Matrix
+
+All byte strings are hex. `R` means one `U+FFFD` replacement character, encoded
+as `EF BF BD`. `same` means the original byte string is returned unchanged.
+
+| Input class | Minimal bytes | `wp_is_valid_utf8()` | `seems_utf8()` | `wp_check_invalid_utf8( false )`, UTF-8 charset | `wp_check_invalid_utf8( true )`, UTF-8 charset | `wp_check_invalid_utf8()`, non-UTF-8 charset |
+| --- | --- | --- | --- | --- | --- | --- |
+| ASCII | `41` | accept | accept | same | same | same |
+| Valid 2-byte lower edge | `C2 80` | accept | accept | same | same | same |
+| Valid 3-byte lower edge | `E0 A0 80` | accept | accept | same | same | same |
+| Valid 4-byte upper edge | `F4 8F BF BF` | accept | accept | same | same | same |
+| Noncharacter `U+FFFE` | `EF BF BE` | accept | accept | same | same | same |
+| Replacement character `U+FFFD` | `EF BF BD` | accept | accept | same | same | same |
+| Lone continuation | `80` | reject | reject | `''` | `R` | same |
+| Invalid `FE`/`FF` lead | `FE` | reject | reject | `''` | `R` | same |
+| Truncated 2-byte sequence | `C2` | reject | reject | `''` | `R` | same |
+| Truncated 3-byte sequence | `E2 8C` | reject | reject | `''` | `R` | same |
+| Truncated 4-byte sequence | `F1 80 80` | reject | reject | `''` | `R` | same |
+| Overlong 2-byte sequence | `C0 80` | reject | accept | `''` | `R R` | same |
+| Overlong 3-byte sequence | `E0 80 80` | reject | accept | `''` | `R R R` | same |
+| Overlong 4-byte sequence | `F0 80 80 80` | reject | accept | `''` | `R R R R` | same |
+| UTF-16 surrogate sequence | `ED A0 80` | reject | accept | `''` | `R R R` | same |
+| Above `U+10FFFF`, `F4` form | `F4 90 80 80` | reject | accept | `''` | `R R R R` | same |
+| Above `U+10FFFF`, `F5` form | `F5 80 80 80` | reject | accept | `''` | `R R R R` | same |
+| Obsolete 5-byte sequence | `F8 80 80 80 80` | reject | accept | `''` | `R R R R R` | same |
+| Obsolete 6-byte sequence | `FC 80 80 80 80 80` | reject | accept | `''` | `R R R R R R` | same |
+| Valid text plus overlong bytes | `41 C0 80 5A` | reject | accept | `''` | `41 R R 5A` | same |
+
+## Divergence Classes
+
+### `seems_utf8()` accepts overlong encodings
+
+Representative inputs: `C0 80`, `E0 80 80`, `F0 80 80 80`.
+
+Classification: accidental if the caller expects valid UTF-8; historically
+load-bearing only as a loose structural heuristic.
+
+Evidence:
+
+- `src/wp-includes/formatting.php` says the function checks whether the string
+  "fits a UTF-8 model", not whether it is well-formed UTF-8.
+- Core Trac #38044 was specifically opened to make `seems_utf8()` RFC 3629
+  compliant and calls out overlong acceptance as a defect:
+  https://core.trac.wordpress.org/ticket/38044
+- Commit `bb6ed3ba22` introduced `wp_is_valid_utf8()` and deprecated
+  `seems_utf8()` instead of tightening the old function in place.
+
+Impact:
+
+Replacing `seems_utf8()` with `wp_is_valid_utf8()` is behavior-changing for
+saved data containing these bytes: old code reports "yes"; strict validation
+reports "no".
+
+### `seems_utf8()` accepts UTF-16 surrogate encodings
+
+Representative input: `ED A0 80`.
+
+Classification: accidental. Surrogate halves are not Unicode scalar values and
+are rejected by the strict validator and by the fuzzer battery.
+
+Evidence:
+
+- Trac #38044 explicitly names surrogate acceptance as part of the RFC 3629
+  compliance problem: https://core.trac.wordpress.org/ticket/38044
+- The current `wp_is_valid_utf8()` docblock gives surrogate halves as invalid
+  examples.
+
+Impact:
+
+Same as overlongs: `wp_is_valid_utf8()` is the correct replacement for
+validation, but it is not a byte-for-byte-compatible replacement.
+
+### `seems_utf8()` accepts code points above `U+10FFFF`
+
+Representative inputs: `F4 90 80 80`, `F5 80 80 80`.
+
+Classification: accidental. The code accepts any `F0`-`F7` lead followed by
+three continuation bytes, but modern UTF-8 stops at `F4 8F BF BF`.
+
+Evidence:
+
+- The current `wp_is_valid_utf8()` docblock defines well-formed UTF-8 as
+  excluding characters above the representable range.
+- Trac #38044 frames the replacement around RFC 3629 compliance, whose range is
+  `U+0000..U+10FFFF`.
+
+Impact:
+
+Strict replacement rejects bytes that the legacy heuristic accepted. Treat this
+as a migration break for data-validation callers.
+
+### `seems_utf8()` accepts obsolete 5- and 6-byte forms
+
+Representative inputs: `F8 80 80 80 80`, `FC 80 80 80 80 80`.
+
+Classification: documented historical looseness, not valid UTF-8. The
+docblock warns that the function checks 5-byte sequences even though UTF-8 has a
+maximum length of 4 bytes; the code also accepts 6-byte forms.
+
+Evidence:
+
+- The 5-byte warning was added in the 2009 cleanup associated with Trac #9692:
+  https://core.trac.wordpress.org/ticket/9692
+- Trac #38044 records the later decision to deprecate rather than repair this
+  legacy behavior in place.
+
+Impact:
+
+This is the clearest documented non-strict behavior. A strict replacement is
+still desirable for validation, but compatibility notes should call out the
+change explicitly.
+
+### `wp_check_invalid_utf8( $s, false )` rejects the whole string
+
+Representative input: `41 C0 80 5A`.
+
+Classification: intentional security behavior. Under UTF-8 charset, any invalid
+span makes the default mode return `''`; it does not preserve valid surrounding
+text.
+
+Evidence:
+
+- Trac #8767 introduced the helper in a security/XSS context and discussed the
+  default empty-string behavior as the more conservative validator-like option:
+  https://core.trac.wordpress.org/ticket/8767
+- The current docblock documents this default mode.
+
+Impact:
+
+`wp_scrub_utf8()` is not a drop-in replacement for default-mode callers because
+it preserves the string and inserts replacement characters. That can be a better
+user experience in some contexts, but it changes escaping and sanitization
+behavior.
+
+### `wp_check_invalid_utf8( $s, true )` now scrubs with `U+FFFD`
+
+Representative input: `C0 80` produces `R R`.
+
+Classification: intentional current behavior. On this branch, `$strip = true`
+matches `wp_scrub_utf8()` for all generated strict-invalid inputs under UTF-8
+charset.
+
+Evidence:
+
+- Trac #63837 states the plan to rely on `wp_is_valid_utf8()` and add
+  `wp_scrub_utf8()` for replacement-character scrubbing:
+  https://core.trac.wordpress.org/ticket/63837
+- Commit `d1e7f5625b` says the old `$strip` defect was fixed and invalid bytes
+  are now replaced with `U+FFFD` for stronger security guarantees.
+
+Impact:
+
+For UTF-8 charset requests, `wp_scrub_utf8()` is behavior-equivalent to
+`wp_check_invalid_utf8( $s, true )` except for the legacy function's
+`blog_charset` gate.
+
+### `wp_check_invalid_utf8()` passes through all bytes for non-UTF-8 charsets
+
+Representative input under `ISO-8859-1` charset: `C0 80` returns `C0 80` in
+both modes.
+
+Classification: intentional environment sensitivity.
+
+Evidence:
+
+- The current docblock says the function only performs work when
+  `blog_charset` is UTF-8.
+- Trac #63837 calls out that the function assumes input strings are encoded
+  with `blog_charset`, and says that point is inherent to how it works:
+  https://core.trac.wordpress.org/ticket/63837
+
+Impact:
+
+Neither `wp_is_valid_utf8()` nor `wp_scrub_utf8()` is a drop-in replacement
+where the caller intentionally wants `blog_charset`-dependent passthrough.
+
+## Current Core Callers
+
+### `seems_utf8()`
+
+No production callers remain in this checkout. The only in-tree production
+reference found by `rg` is the function definition itself.
+
+Migration guidance:
+
+- For validation callers, `wp_is_valid_utf8()` is the intended replacement, but
+  it is behavior-changing for overlongs, surrogates, above-range code points,
+  and 5/6-byte forms.
+- For charset-guessing callers, `wp_is_valid_utf8()` is not a semantic drop-in.
+  Such callers should make the heuristic explicit instead of using
+  `seems_utf8()`.
+
+### `esc_js()`
+
+Current call: `wp_check_invalid_utf8( $text )`.
+
+Migration guidance:
+
+- `wp_is_valid_utf8()` is not a drop-in; it returns a boolean and does not
+  produce escaped text.
+- `wp_scrub_utf8()` is behavior-changing; invalid input would be preserved with
+  `U+FFFD` instead of blanked before JavaScript escaping.
+- Keep `wp_check_invalid_utf8()` unless the security model is explicitly
+  changed from whole-string rejection to scrubbing.
+
+### `esc_html()`
+
+Current call: `wp_check_invalid_utf8( $text )`.
+
+Migration guidance:
+
+- `wp_scrub_utf8()` is behavior-changing but may be a future product decision if
+  preserving partially valid display text is preferred.
+- It is not a drop-in for current behavior because default-mode
+  `wp_check_invalid_utf8()` returns `''` for any invalid UTF-8 under UTF-8
+  charset and passes raw bytes through under non-UTF-8 charset.
+
+### `esc_attr()`
+
+Current call: `wp_check_invalid_utf8( $text )`.
+
+Migration guidance:
+
+- Attribute context is especially sensitive to partial decoding and downstream
+  parser behavior. Keep whole-string rejection unless a dedicated security
+  review approves replacement-character scrubbing.
+- `wp_is_valid_utf8()` is not a drop-in output function.
+
+### `esc_xml()`
+
+Current call: `wp_check_invalid_utf8( $text )`.
+
+Migration guidance:
+
+- `wp_scrub_utf8()` would be plausible for XML generation because XML requires
+  valid character data, but it changes the output contract from blanking to
+  replacement.
+- A direct replacement needs XML-specific review, especially because XML also
+  has character restrictions beyond UTF-8 well-formedness.
+
+### `_sanitize_text_fields()`, via `sanitize_text_field()` and `sanitize_textarea_field()`
+
+Current call: `wp_check_invalid_utf8( $str )`.
+
+Migration guidance:
+
+- `wp_scrub_utf8()` is behavior-changing: stored/sanitized values that
+  currently become empty would retain valid surrounding text and replacement
+  characters.
+- This may be user-friendlier, but it is not a drop-in. Treat it as a product
+  and compatibility decision.
+
+### `_wp_json_convert_string()`
+
+Current fallback call: `wp_check_invalid_utf8( $input_string, true )`, only when
+`mb_convert_encoding()` is unavailable.
+
+Migration guidance:
+
+- Under UTF-8 charset on this branch, `wp_scrub_utf8()` is behavior-equivalent
+  for generated invalid inputs and is the clearer operation.
+- It is still not a full drop-in because `wp_check_invalid_utf8()` preserves raw
+  input when `blog_charset` is not UTF-8.
+- JSON output must be UTF-8, so this is the best candidate for a targeted future
+  migration away from `wp_check_invalid_utf8()`.
+
+## Recommendations
+
+### `seems_utf8()`: keep deprecated; do not repair in place
+
+The function is a loose structural heuristic with no remaining production core
+callers. It accepts several classes of invalid UTF-8 by design of its bit-mask
+model, and changing the implementation in place would silently change external
+caller behavior.
+
+Recommended action:
+
+- Keep the existing deprecation to `wp_is_valid_utf8()`.
+- Do not include it in continuous differential fuzzing against strict UTF-8
+  validation; the known divergences are permanent unless the deprecated
+  function is removed or broken for compatibility.
+- If docs are touched, say explicitly that it accepts overlongs, surrogates,
+  above-range code points, and obsolete 5/6-byte forms. The current docblock
+  mentions 5-byte sequences, but not the full divergence set.
+
+### `wp_check_invalid_utf8()`: document and leave for default-mode callers
+
+The current branch has already removed the historical PCRE dependency for
+UTF-8 charset requests. The remaining divergences are semantic:
+
+- default mode rejects the entire invalid string;
+- strip mode scrubs with `U+FFFD`;
+- all modes pass bytes through when `blog_charset` is not UTF-8.
+
+Recommended action:
+
+- Keep default-mode calls in escaping and sanitization until each context has an
+  explicit security and compatibility decision.
+- Prefer `wp_scrub_utf8()` for new code that unconditionally wants valid UTF-8
+  output and does not want `blog_charset` sensitivity.
+- Consider a targeted follow-up for `_wp_json_convert_string()`'s fallback path,
+  because JSON wants UTF-8 and current `$strip = true` behavior already matches
+  `wp_scrub_utf8()` under UTF-8 charset.
+
+## Sources Checked
+
+- Local function history: `git log -L :seems_utf8:src/wp-includes/formatting.php`
+- Local function history: `git log -L :wp_check_invalid_utf8:src/wp-includes/formatting.php`
+- Current `seems_utf8()` deprecation and `wp_is_valid_utf8()` introduction:
+  commit `bb6ed3ba22`
+- Current `wp_check_invalid_utf8()` / `wp_scrub_utf8()` rewrite:
+  commit `d1e7f5625b`
+- Trac #9692, `seems_utf8()` cleanup:
+  https://core.trac.wordpress.org/ticket/9692
+- Trac #8767, original `wp_check_invalid_utf8()` security refactor:
+  https://core.trac.wordpress.org/ticket/8767
+- Trac #38044, RFC 3629 compliance and `wp_is_valid_utf8()`:
+  https://core.trac.wordpress.org/ticket/38044
+- Trac #63837, `wp_check_invalid_utf8()` rewrite and `wp_scrub_utf8()`:
+  https://core.trac.wordpress.org/ticket/63837
+- Trac #29717, historical PCRE behavior and caller importance:
+  https://core.trac.wordpress.org/ticket/29717
+- Trac #63863, standardizing UTF-8 handling and fallbacks:
+  https://core.trac.wordpress.org/ticket/63863