Skip to content

resolve_link's '.md#' substring check corrupts non-markdown URLs whose query string contains '.md#' #397

Description

@yumike

Summary

resolve_link in crates/rw-renderer/src/html.rs uses `url.contains(".md#")` (line 131) to detect markdown links that carry a fragment. The check is a plain substring match anywhere in the URL, so any link whose query string happens to contain .md# — e.g. a viewer/renderer URL that takes a markdown filename as a parameter and a fragment to jump to — is misclassified as a markdown link, has its query-string .md silently stripped, and is also rewritten under base_path.

Reproduction

Verified on main (head 1b39a1b) with a one-off test in crates/rw-renderer/src/html.rs calling resolve_link(input, \"/base/path\"):

Input Actual output Expected
chart.png?file=spec.md#section /base/path/chart.png?file=spec#section chart.png?file=spec.md#section (unchanged — not a markdown link)
/api/render?file=spec.md#step-1 /api/render?file=spec#step-1 /api/render?file=spec.md#step-1
viewer.html?path=docs/spec.md#h1 /base/path/viewer.html?path=docs/spec#h1 viewer.html?path=docs/spec.md#h1

Control cases — these continue to work correctly:

Input Output (correct)
spec.md#section /base/path/spec#section
./page.md /base/path/page
https://example.com/spec.md#x https://example.com/spec.md#x (unchanged — external)
image.png?desc=summary (no .md#) image.png?desc=summary (unchanged)

Two distinct silent corruptions on the bug cases:

  1. The query-string value spec.md becomes spec — strip_suffix(".md") runs on the path portion, but the path portion is everything before #, which includes the query string.
  2. The link target moves from "resolved relative to the current page" to "rooted under `/base/path/`" — turning a relative link to a same-directory tool into an absolute path that probably 404s.

Root cause

crates/rw-renderer/src/html.rs:118-161:

```rust
// Only process markdown links
if !url.ends_with(".md") && !url.contains(".md#") {
return url.to_owned();
}

// Split URL into path and fragment
let (path_part, fragment) = if let Some(hash_pos) = url.find('#') {
(&url[..hash_pos], Some(&url[hash_pos..]))
} else {
(url, None)
};
```

The intent of the .md# branch is to catch `page.md#section` (a markdown link with a fragment), since `ends_with(".md")` alone would miss it. But the implementation accepts the substring anywhere — including inside query strings and inside fragment text.

After admitting the URL, the function splits on #, strips .md from the end of the path portion, and prefixes /base_path/ if the path is relative. Both transformations are wrong for non-markdown URLs.

Suggested fix

Detect markdown links by checking that .md is followed by either end-of-string, #, or ?, but only when no earlier ? appears. Roughly:

```rust
fn is_markdown_link(url: &str) -> bool {
// Split at the first '?' (query) or '#' (fragment); only consider the path part
let path_end = url.find(|c| c == '?' || c == '#').unwrap_or(url.len());
let path = &url[..path_end];
path.ends_with(".md")
}

if !is_markdown_link(url) {
return url.to_owned();
}
```

Then the existing fragment split below is fine — it only runs for genuine markdown links.

Why it matters

Plausible real-world links that this breaks:

  • Links to in-repo docs viewers / preview tools: [Preview](preview.html?file=spec.md#h1).
  • Links to renderers or API endpoints that take a markdown filename as a query parameter.
  • Generated links from third-party tools that happen to include .md# in tracking parameters or fragment names.

The breakage is silent — no warning, no error, just a broken link. And because the bad branch also prefixes base_path, the result is often a 404 to a path that didn't exist before, which is hard to trace back to a markdown-rendering rule.

Impact

Low blast radius (depends on whether authors use such URLs), but always silent and confusing when it happens. The fix is small and targeted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions