Skip to content

Align heading-id slugs with jgm/djot#393; pluggable ASCII-folding extension#224

Merged
dereuromark merged 6 commits into
masterfrom
fix/spec-heading-ids
Jun 6, 2026
Merged

Align heading-id slugs with jgm/djot#393; pluggable ASCII-folding extension#224
dereuromark merged 6 commits into
masterfrom
fix/spec-heading-ids

Conversation

@dereuromark

@dereuromark dereuromark commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Aligns auto-generated heading ids with the settled jgm/djot#393 rule, and adds ASCII
folding back as a pluggable, opt-in extension for sites that want maximum-portability
anchors.

Default change (spec-safe, per #393)

The identifier is formed by replacing each maximal run of non-alphanumeric ASCII with
a single - and trimming. Letter case and all non-ASCII characters are preserved.
This drops two djot-php-specific behaviors that diverged from the spec:

  • always-on ASCII transliteration (Über->Uber) - now opt-in
  • the _ exception - _ is non-alphanumeric ASCII, so it is replaced (#393 removed
    the per-character exceptions, #391)
input before after (#393)
Über café Uber-cafe Über-café
under_score under_score under-score
日本語 ri-ben-yu 日本語
a--b a-b a-b

A leading-digit result is prefixed s- (9 lives -> s-9-lives) for CSS-selector safety
(querySelector('#9-x') throws); orthogonal to #393, which governs punctuation only. This
PR also unifies that prefix from the old h- to s-, matching the existing empty-heading
s-N fallback. Empty results still fall back to s-N.

Opt-in: AsciiHeadingIdsExtension

ASCII folding is no longer a baked-in flag - it is a pluggable id transform. The core
HeadingIdTracker produces the pure #393 id and accepts an optional
Closure(string): string; AsciiHeadingIdsExtension sets that transform. It is wired to
both the renderer's tracker and the parser's heading-reference resolution, so
<section id> and implicit [Heading][] link targets stay in parity. The transform runs
over the spec id and is re-slugged, so a transform that reintroduces separators (e.g. CJK
romanization) still yields a clean id.

Retaining old behavior / web-safe ASCII URL ids (migration)

This is a breaking change to default heading-id output. To get behavior close to the
previous default - and ASCII-only, percent-encode-free, selector-safe anchors for web
publishing - add the extension:

$converter = new DjotConverter();
$converter->addExtension(new AsciiHeadingIdsExtension()); // Über café -> uber-cafe

Notes on backward compatibility:

  • The extension restores ASCII folding, but the punctuation rule still follows #393
    (_ -> -) and the prefix is now s- (not h-), so a handful of ids differ even with
    the extension on. There is no flag that reproduces the old ids byte-for-byte - that is
    the deliberate spec-alignment.
  • To pin a specific anchor regardless of the algorithm, set an explicit id on the heading
    (## Title {#my-id}) - explicit ids always win and are unchanged.
  • Unicode ids in the new default are valid HTML5 and resolve in browsers (URL fragments
    percent-encode); ASCII is only needed for maximum portability, which the extension
    provides.

Notes

  • The official djot test suite is unchanged (its id cases are simple ASCII,
    case-preserved). Internal tests that pinned the old transliterating behavior were
    updated to the #393 output; the extension and the pluggable transform have their own
    tests.

Default heading-id generation now follows the settled #393 rule: replace each
maximal run of non-alphanumeric ASCII with a single '-' and trim leading/trailing
'-', preserving letter case and all non-ASCII characters. This drops the previous
always-on ASCII transliteration and the '_' exception, so 'Über café' becomes
'Über-café', 'under_score' becomes 'under-score', 'a--b' becomes 'a-b'. A
leading-digit result keeps the 'h-' prefix for CSS-selector safety (orthogonal to
#393, which governs punctuation only).

ASCII transliteration is now opt-in via a new asciiHeadingIds option on
DjotConverter, threaded to both the renderer and the parser's reference-resolution
pass so heading ids stay in parity. With it enabled, 'Über café' becomes
'uber-cafe'.

The official djot test suite is unchanged (its id cases are simple ASCII).
@dereuromark dereuromark added the enhancement New feature or request label Jun 6, 2026
@codecov

codecov Bot commented Jun 6, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.86%. Comparing base (fe05794) to head (23ef085).

Additional details and impacted files
@@            Coverage Diff            @@
##             master     #224   +/-   ##
=========================================
  Coverage     91.85%   91.86%           
- Complexity     3497     3504    +7     
=========================================
  Files           104      105    +1     
  Lines          9921     9929    +8     
=========================================
+ Hits           9113     9121    +8     
  Misses          808      808           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…x to s-

Replace the baked-in asciiHeadingIds bool with a pluggable id transform on
HeadingIdTracker (Closure(string): string). The core stays pure jgm/djot#393
(unicode-preserving); ASCII folding now ships as AsciiHeadingIdsExtension, which
sets the transform on both the renderer's tracker and the parser's heading-reference
resolution so section ids and [Heading][] link targets stay in parity.

The transform runs over the spec id and is re-slugged afterwards, so a transform
that reintroduces separators (e.g. CJK romanization "ri ben yu") still yields a
clean id.

Also unify the leading-digit prefix from h- to s- so it matches the empty-heading
s-N fallback (one prefix convention).
@dereuromark dereuromark changed the title Align heading-id slugs with jgm/djot#393; add opt-in asciiHeadingIds Align heading-id slugs with jgm/djot#393; pluggable ASCII-folding extension Jun 6, 2026
Rewrite the heading-id reference section for the jgm/djot#393 default (letter case
and non-ASCII preserved, s- prefix for leading digits, s-N empty fallback) and
document AsciiHeadingIdsExtension as the opt-in ASCII-folding path, including its
parser/renderer parity and the note that registration order does not matter.
The headingIdTransformer property lost its (string): string signature when
phpcbf mangled the single-line annotation (6ea8855) into invalid syntax and the
workaround dropped to a bare \Closure|null. Restore the signature with the
two-line var + phpstan-var pattern already used by FrontmatterExtension: the
plain var stays \Closure|null (phpcbf leaves it untouched) and the parenthesized
phpstan-var carries the full (\Closure(string): string)|null. Verified phpcbf no
longer mangles it; phpcs and phpstan are clean.
@dereuromark dereuromark merged commit b604a3f into master Jun 6, 2026
6 checks passed
@dereuromark dereuromark deleted the fix/spec-heading-ids branch June 6, 2026 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant