Skip to content

DirectiveProcessor isn't safe to reuse across documents, but the &mut self API invites it #394

Description

@yumike

Summary

DirectiveProcessor::process takes &mut self, suggesting "build once, process many" is supported. In reality, internal state isn't reset between calls, so reuse silently corrupts output (a doc that ends inside an unclosed fence leaves fence_char set, so the next doc's first line is treated as in-fence and no directives expand; warnings and active_containers also leak across calls). Every caller in this repo builds a fresh DirectiveProcessor per document — so the bug is latent — but the public &mut self API is a footgun the moment someone reuses an instance for performance.

Note (updated): the original report also covered MarkdownRenderer::render / render_markdown. That part is now resolved by the Pipeline refactor — see the bottom of this issue.

The defect — DirectiveProcessor::process

crates/rw-renderer/src/directive/processor.rs:229. process() calls process_with_depth(input, 0), which finalizes containers on exit at depth == 0. But these fields are never reset at the top of process(), so they carry over between separate calls:

  • fence: FenceTracker — if doc N ends inside an unclosed ``` block, fence_char = Some(...) persists; doc N+1's very first line sees in_fence() == true and process_line returns the line verbatim (no directives processed).
  • warnings: Vec<String> — appended to across calls; the caller has no way to know which warnings belong to which document.
  • active_containers: Vec<String> — normally drained by the top-level finalize, but a container left open at end-of-doc stays on the stack and bleeds into the next call.

Registered handlers (e.g. TabsDirective) are themselves &mut self and accumulate their own per-document state and warnings across calls too.

Latent failure mode (triggers as soon as a caller reuses a processor):

let mut p = DirectiveProcessor::new().with_container(TabsDirective::new());
let a = p.process("```\nfoo");                  // unclosed fence
let b = p.process(":::tab[X]\nhi\n:::");        // every line treated as in-fence
// b is identical to its input — tabs not expanded

Today's safety: both in-repo callers build a fresh DirectiveProcessor inside a per-render Pipelinecrates/rw-site/src/page.rs:437 and crates/rw-confluence/src/renderer.rs:154 — and Pipeline is consumed by MarkdownRenderer::render (documented "build a fresh one per render"). Pipeline is also deliberately Send but not Sync ("each document gets its own handler"). So no supported flow reuses a processor. But DirectiveProcessor is a public type with a public &mut self process, so direct reuse outside the pipeline remains a silent footgun.

Suggested fixes

Pick one of:

  1. Reset state at the top of process: clear fence, active_containers, warnings (and ask handlers to reset). Preserves the &mut self API and the implied reuse contract — matches what callers already assume.
  2. Change the API to consume self (fn process(self, input: &str) -> String). Makes reuse a compile error; signals one-shot intent. Costs: callers update (though they already build one per render, so the churn is small).

(1) or (2) both remove the latent-footgun shape. Given the Pipeline already consumes its directives per render, (2) is cheap here and the most honest signal.

Why file this if it doesn't trigger today

  • Anyone writing let mut p = DirectiveProcessor::...; p.process(a); p.process(b); reasonably expects the second call to be independent. The current implementation makes that wrong silently.
  • The in_fence carry-over is hard to debug — it manifests as directives mysteriously not expanding in an entire document, far from the actual cause (the previous document's unclosed fence).

Resolved since filing: MarkdownRenderer::render / render_markdown

The original Part 1 of this issue is no longer valid after the Pipeline refactor:

  • render_markdown is gone. The single entry point is now render(&self, markdown: &str, pipeline: Pipeline) -> RenderResult (crates/rw-renderer/src/renderer.rs:212) — it takes &self.
  • Every per-document mutable field the original report listed (id_counts, seen_first_h1, code_block_index, list_stack, alert_stack, image, pending_attrs, …) moved off MarkdownRenderer into a Walker that is constructed fresh on every render call (renderer.rs:224).
  • A compile-time contract (renderer.rs:264) now asserts MarkdownRenderer is Send + Sync specifically so it can be parked in an Arc and reused across request handlers — the exact "build once, render many" scenario the original issue feared is now the intended, safe usage.

So the renderer half is fixed; only the DirectiveProcessor half above remains.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions