Skip to content

Replacements::apply is documented as single-pass but performs N sequential html.replace() calls, allowing cascade between replacements #395

Description

@yumike

Summary

crates/rw-renderer/src/directive/replacements.rs documents Replacements::apply as performing all registered replacements in a single pass over the HTML, with O(1) total allocations. The implementation is in fact a loop of html.replace(needle, replacement) calls — N allocations, N passes — and the second replacement's needle can match output produced by the first. The cascading semantics are real and demonstrated by the crate's own test_replacement_order test, but the public API documentation tells callers something different.

Doc claim vs implementation

The module doc (lines 7–24):

```text
Instead of each directive calling `html.replace()` (O(N) allocation per call),
all directives register their replacements, then [apply()] performs
them in a single pass over the HTML string.

Performance

Naive approach (N handlers, M replacements each):
for handler in handlers: # N iterations
html = html.replace(...) # O(len) allocation per replace
Total: O(N × M × len) allocations

Replacements approach:
for handler in handlers: # N iterations
handler.post_process(&mut replacements) # collect only
replacements.apply(html) # single O(len) allocation
Total: O(1) allocation
```

The actual implementation (lines 80–84):

```rust
for (from, to) in self.items {
if html.contains(&from) {
*html = html.replace(&from, &to);
}
}
```

This is exactly the "naive approach" the doc says it's avoiding — N independent String::replace calls, each allocating a fresh String. There's an internal comment at line 77–79 (// For a small number of replacements, sequential replace is efficient enough...) that acknowledges this, but the public docs above it still claim O(1) and "single pass".

Cascade is real

The crate's own test test_replacement_order (around line 180 of the same file) shows the cascade:

```text
"aaa" with [("a", "bb"), ("bb", "c")] → "ccc" (not "bbbbbb")
```

The second replacement matches the first's output. With a true single-pass implementation (e.g. aho-corasick's multi-pattern replace), the result would be \"bbbbbb\" — each input character matched once against the patterns.

Why it doesn't currently produce wrong output

I traced the only built-in caller, TabsDirective::post_process (tabs/directive.rs:154-169), which registers:

  1. <rw-tabs data-id=\"N\"> → tabs-open HTML
  2. <rw-tab data-id=\"N\"> → panel-open HTML
  3. </rw-tab></div>
  4. </rw-tabs></div>

The critical check is whether replacement 3's needle (</rw-tab>) can match inside </rw-tabs>. It can't — </rw-tab> ends with > literally, while </rw-tabs> has s> after tab. No false match, no cascade today.

But this is incidental safety, not a guarantee — the safety depends on TabsDirective being the only registered consumer and on its particular tag-name choices. Any new directive that emits HTML containing substrings of another directive's needles silently corrupts. The API contract ("register and forget, single pass") gives callers no reason to suspect the order matters.

Suggested fix

Two options, depending on whether the perf claim or the semantics claim is the one you want to keep:

  1. Make the implementation actually single-pass. Replace the loop with aho-corasick-backed multi-pattern replace (the file even suggests this at line 69). Then the doc becomes truthful and the cascade goes away.
  2. Fix the doc to match the implementation. Drop the "single pass / O(1) allocation" claim, document the sequential semantics explicitly ("replacements applied in registration order; each replacement runs over the output of the previous one"), and warn callers that overlapping needles cascade. Maybe rename apply to apply_sequential to make the contract clearer.

(1) is the more robust answer — it's what the API name and module doc already imply, and it removes a footgun for future directive authors.

Impact

  • Today: silently safe due to TabsDirective's particular tag choices. No user-visible regression.
  • Future: any new container or post-processing directive can land overlapping replacements without warning. The corruption would be subtle (e.g., a closing tag converted twice) and would only show up in specific markdown structures.
  • API contract: callers reading the doc reasonably assume registration order doesn't matter; that assumption is currently wrong.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions