Align heading-id slugs with jgm/djot#393; pluggable ASCII-folding extension#224
Merged
Conversation
Default heading-id generation now follows the settled #393 rule: replace each maximal run of non-alphanumeric ASCII with a single '-' and trim leading/trailing '-', preserving letter case and all non-ASCII characters. This drops the previous always-on ASCII transliteration and the '_' exception, so 'Über café' becomes 'Über-café', 'under_score' becomes 'under-score', 'a--b' becomes 'a-b'. A leading-digit result keeps the 'h-' prefix for CSS-selector safety (orthogonal to #393, which governs punctuation only). ASCII transliteration is now opt-in via a new asciiHeadingIds option on DjotConverter, threaded to both the renderer and the parser's reference-resolution pass so heading ids stay in parity. With it enabled, 'Über café' becomes 'uber-cafe'. The official djot test suite is unchanged (its id cases are simple ASCII).
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #224 +/- ##
=========================================
Coverage 91.85% 91.86%
- Complexity 3497 3504 +7
=========================================
Files 104 105 +1
Lines 9921 9929 +8
=========================================
+ Hits 9113 9121 +8
Misses 808 808 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
…x to s- Replace the baked-in asciiHeadingIds bool with a pluggable id transform on HeadingIdTracker (Closure(string): string). The core stays pure jgm/djot#393 (unicode-preserving); ASCII folding now ships as AsciiHeadingIdsExtension, which sets the transform on both the renderer's tracker and the parser's heading-reference resolution so section ids and [Heading][] link targets stay in parity. The transform runs over the spec id and is re-slugged afterwards, so a transform that reintroduces separators (e.g. CJK romanization "ri ben yu") still yields a clean id. Also unify the leading-digit prefix from h- to s- so it matches the empty-heading s-N fallback (one prefix convention).
Rewrite the heading-id reference section for the jgm/djot#393 default (letter case and non-ASCII preserved, s- prefix for leading digits, s-N empty fallback) and document AsciiHeadingIdsExtension as the opt-in ASCII-folding path, including its parser/renderer parity and the note that registration order does not matter.
The headingIdTransformer property lost its (string): string signature when phpcbf mangled the single-line annotation (6ea8855) into invalid syntax and the workaround dropped to a bare \Closure|null. Restore the signature with the two-line var + phpstan-var pattern already used by FrontmatterExtension: the plain var stays \Closure|null (phpcbf leaves it untouched) and the parenthesized phpstan-var carries the full (\Closure(string): string)|null. Verified phpcbf no longer mangles it; phpcs and phpstan are clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Aligns auto-generated heading ids with the settled jgm/djot#393 rule, and adds ASCII
folding back as a pluggable, opt-in extension for sites that want maximum-portability
anchors.
Default change (spec-safe, per #393)
The identifier is formed by replacing each maximal run of non-alphanumeric ASCII with
a single
-and trimming. Letter case and all non-ASCII characters are preserved.This drops two djot-php-specific behaviors that diverged from the spec:
Über->Uber) - now opt-in_exception -_is non-alphanumeric ASCII, so it is replaced (#393 removedthe per-character exceptions, #391)
Über caféUber-cafeÜber-caféunder_scoreunder_scoreunder-score日本語ri-ben-yu日本語a--ba-ba-bA leading-digit result is prefixed
s-(9 lives->s-9-lives) for CSS-selector safety(
querySelector('#9-x')throws); orthogonal to #393, which governs punctuation only. ThisPR also unifies that prefix from the old
h-tos-, matching the existing empty-headings-Nfallback. Empty results still fall back tos-N.Opt-in:
AsciiHeadingIdsExtensionASCII folding is no longer a baked-in flag - it is a pluggable id transform. The core
HeadingIdTrackerproduces the pure #393 id and accepts an optionalClosure(string): string;AsciiHeadingIdsExtensionsets that transform. It is wired toboth the renderer's tracker and the parser's heading-reference resolution, so
<section id>and implicit[Heading][]link targets stay in parity. The transform runsover the spec id and is re-slugged, so a transform that reintroduces separators (e.g. CJK
romanization) still yields a clean id.
Retaining old behavior / web-safe ASCII URL ids (migration)
This is a breaking change to default heading-id output. To get behavior close to the
previous default - and ASCII-only, percent-encode-free, selector-safe anchors for web
publishing - add the extension:
Notes on backward compatibility:
(
_->-) and the prefix is nows-(noth-), so a handful of ids differ even withthe extension on. There is no flag that reproduces the old ids byte-for-byte - that is
the deliberate spec-alignment.
(
## Title {#my-id}) - explicit ids always win and are unchanged.percent-encode); ASCII is only needed for maximum portability, which the extension
provides.
Notes
case-preserved). Internal tests that pinned the old transliterating behavior were
updated to the #393 output; the extension and the pluggable transform have their own
tests.