feat: match github's markdown fragment generation by katrinafyi · Pull Request #2153 · lycheeverse/lychee

katrinafyi · 2026-04-21T11:25:28Z

This copies the approach of Flet/github-slugger#56 for deriving fragment identifiers from markdown headings.

Basically, it deletes characters from certain Unicode categories, then it lowercases and replaces spaces with -. and finally applies a disambiguation with numbers.

To check the test cases, you can see this gist or create your own markdown file on github.

Fixes #2112. This can be verified with

$ cargo run -- https://raw.githubusercontent.com/adamlui/js-utils/refs/heads/main/minify.js/node.js/docs/zh-cn/README.md  -vvvvv --include-fragments

...

   [200] https://raw.githubusercontent.com/adamlui/js-utils/refs/heads/main/minify.js/node.js/docs/zh-cn/README.md#%EF%B8%8F-mit-%E8%AE%B8%E5%8F%AF%E8%AF%81 (at 28:10)

so, rust `to_lowercase` actually does *more* transformations than github does which leads to differences in certain weird cases. for example, SpecialCasing.txt says greek `Σ` should lowercase to `σ` in most cases but `ς` when it ends a word. rust does this but github does not.

it could be one-copy, but .to_lowercase() is probably much faster on strings than individual characters

mre

Just one more fly-by comment. ;)

mre · 2026-04-26T15:29:03Z

+        if this_suffix.is_some() {
+            self.next_suffixes.insert(candidate.clone(), ONE);
+        }


Can you help me understand this logic?

If base_id is "foo" and it's seen for the second time, candidate becomes "foo-1". You then insert "foo-1" -> 1 into the map. If the very next heading in the document is actually titled "# foo 1", the generator will:

Generate base_id = "foo-1".

See that "foo-1" is already in the map.

Increment it and produce "foo-1-1".

Doesn't this create a "jump" in numbering if headings naturally collide with generated suffixes? Would that be an issue?

I'm not sure what you mean by jump. Do you mean because the foo-1-1 ID has two hyphens? Or, are you worried about seeing a "foo" next and skipping some numbers?

If it's the second case, the next "foo" will resume the sequence from 2 and no numbers will be skipped. The incrementing numbers are always appended onto the original base_id.

Edit: I should add that avoiding conflicts with generated suffixes is the most complicated part of this code. There's two ways to write this code, one that has complicated after-generarion logic (this code) and one that has complicated collision detection. I can try changing to the complicated collision detection version which avoids this conditional insert (but introduces conditional queries).

Edit 2: the behaviour of headings that conflict with generated suffixes can also be seen in this test case https://github.com/rina-forks/lychee/blob/41362802c150490473c06155d0e11d2ccc3a2c6e/lychee-lib/src/extract/fragments.rs#L212

See if 4161267 is easier to follow.

mre · 2026-04-26T15:36:22Z

 macro_rules! load_fixture {
    ($filename:expr) => {{
-        let path = fixtures_path!().join($filename);
+        let path = test_utils::fixtures_path!().join($filename);


This change assumes that the crate using the macro has test_utils in its dependency tree under that specific name. It's not a big deal right now but if a sub-crate imports the macro but doesn't have test_utils as a direct dependency (or renames it), this will fail to compile. Using $crate::fixtures_path!() is usually safer for macros intended to be exported.

I haven't tested it, but this should compile and work:

#[macro_export] macro_rules! load_fixture { ($filename:expr) => {{ // $crate ensures that we always point to the fixtures_path // defined inside this specific crate. let path = $crate::fixtures_path!().join($filename); std::fs::read_to_string(path).unwrap() }}; }

Same for the other macro below.

mre

We can merge this. Added a few more comments, but they are quite nitpicky tbh. No blockers.

katrinafyi added 5 commits April 21, 2026 18:44

slugify start

1a3221b

docs and cutover

7138e93

add test

90552b3

clippy

4f5843b

typo

057d840

mre reviewed Apr 21, 2026

View reviewed changes

Comment thread lychee-lib/src/extract/markdown.rs Outdated

katrinafyi and others added 7 commits April 21, 2026 23:13

simpler disambiguation, maybe........

80079a9

move to fragments.rs

a8dbe64

rename things: "slug" -> "ID", and "slugify" -> "generate"

e9f6275

rewrite disambiguate to use one fewer hashmap operation in the base case

f509e88

rewrite generate_without_disambiguation to do two copies only

028abfc

it could be one-copy, but .to_lowercase() is probably much faster on strings than individual characters

Update fragments.rs

1fbc409

mre reviewed Apr 22, 2026

View reviewed changes

Comment thread lychee-lib/src/extract/fragments.rs

katrinafyi and others added 7 commits April 23, 2026 13:46

even more corner cases and research

945d881

use rstest and merge the two test cases

9593a00

comments

d3f212d

fix module comment

cea7224

more test for unassigned characters, up to unicode 16.0 atm

32d3e0c

spellchecker:ignore-next-line

5887f4d

Merge branch 'lycheeverse:master' into fragment-slugify

4136280

mre reviewed Apr 26, 2026

View reviewed changes

Comment thread lychee-lib/src/extract/fragments.rs

mre reviewed Apr 26, 2026

View reviewed changes

mre approved these changes Apr 26, 2026

View reviewed changes

katrinafyi added 4 commits April 28, 2026 12:40

use $crate for macro crate self-reference

b46aac4

use more complicated seen check and simpler next_suffixes insert

4161267

clippy and exclude edge cases when parsing

b5c0386

touch up comments after review

e58a23a

katrinafyi changed the title ~~fix: match github's markdown fragment generation~~ feat: match github's markdown fragment generation May 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: match github's markdown fragment generation#2153

feat: match github's markdown fragment generation#2153
katrinafyi wants to merge 23 commits intolycheeverse:masterfrom
rina-forks:fragment-slugify

katrinafyi commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

mre left a comment

Uh oh!

Uh oh!

mre Apr 26, 2026

Uh oh!

katrinafyi Apr 26, 2026 •

edited

Loading

Uh oh!

katrinafyi Apr 28, 2026

Uh oh!

Uh oh!

mre Apr 26, 2026

Uh oh!

mre left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

katrinafyi commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mre Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

katrinafyi Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

katrinafyi Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mre Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

mre left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

katrinafyi commented Apr 21, 2026 •

edited

Loading

katrinafyi Apr 26, 2026 •

edited

Loading