Skip to content

PlantUML !include meta resolution is non-deterministic when two sections share both name and kind (HashMap iteration order leaks into output) #404

Description

@yumike

Summary

SiteSnapshot::get_entity (crates/rw-site/src/site.rs:46) — the implementation of MetaIncludeSource used by PlantUML !include systems/{sys|dmn|svc}_<name>.iuml resolution — picks the first section it finds by (name, kind). The order it walks is the order in which sections_by_name lists indices, which is built by iterating sections.keys() (crates/rw-site/src/site_state.rs:159-168) — Rust's default HashMap with RandomState, seeded per process.

When two sections happen to share both the last path segment (the lookup name) and kind, the chosen entity is non-deterministic across rw serve restarts. Same docs, same source, different rendered diagram and different link URL baked into the SVG, depending on which the hasher happened to put first this run.

Reproduction

Verified on main (head 7ed2ffd) with a focused unit test:

// crates/rw-site/src/site.rs
#[cfg(test)]
mod meta_include_nondet {
    use std::collections::HashSet;
    use std::sync::Arc;
    use rw_storage::MockStorage;
    use rw_kroki::MetaIncludeSource;
    use crate::{Site, PageRendererConfig};

    #[test]
    fn collision_picks_random_section_across_runs() {
        let mut picks = HashSet::new();
        for _ in 0..200 {
            // Two sections share last-segment name (\"payments\") and kind (\"system\").
            let storage = MockStorage::new()
                .with_document_and_kind(\"billing/payments\", \"Billing Payments\", \"system\")
                .with_document_and_kind(\"infra/payments\", \"Infra Payments\", \"system\");
            let site = Site::new(
                Arc::new(storage),
                Arc::new(rw_cache::NullCache),
                PageRendererConfig::default(),
            );
            let _ = site.navigation(None).unwrap(); // force initial load
            let snapshot = site.snapshot_for_test(); // expose via cfg(test)
            let info = snapshot.get_entity(\"system\", \"payments\").unwrap();
            picks.insert(info.title.clone());
        }
        // If resolution were deterministic, we'd see exactly one title across 200 runs.
        // With HashMap RandomState, both surface.
        assert_eq!(
            picks.len(),
            1,
            \"expected stable resolution, observed: {:?}\",
            picks
        );
    }
}

The assertion fails: across 200 same-process iterations, both \"Billing Payments\" and \"Infra Payments\" get picked. Across process restarts the symptom is the same — one or the other ends up in the rendered PlantUML diagram, and which one is observed depends on RandomState's per-process seed.

(The test sketch needs a small snapshot_for_test() shim because SiteSnapshot is pub(crate); the bug also reproduces end-to-end by rendering a markdown page with !include systems/sys_payments.iuml and grepping the resulting SVG for the embedded link URL across restarts.)

Root cause

crates/rw-site/src/site_state.rs:159-168:

let mut sections_by_name: HashMap<String, Vec<usize>> = HashMap::new();
for path in sections.keys() {          // <-- HashMap iteration: randomized
    if let Some(&idx) = path_index.get(path.as_str()) {
        let dir_name = last_segment(path);
        sections_by_name
            .entry(dir_name.to_owned())
            .or_default()
            .push(idx);                // <-- Vec order inherits iteration order
    }
}

crates/rw-site/src/site_state.rs:292-305:

pub fn find_sections_by_name(&self, name: &str) -> Vec<(&str, &Section)> {
    self.sections_by_name
        .get(name)
        .map(|indices| {
            indices.iter().filter_map(|&idx| {
                let path = &self.pages[idx].path;
                self.sections.get(path).map(|info| (path.as_str(), info))
            }).collect()
        })
        .unwrap_or_default()
}

crates/rw-site/src/site.rs:46-69:

impl MetaIncludeSource for SiteSnapshot {
    fn get_entity(&self, entity_type: &str, name: &str) -> Option<EntityInfo> {
        let raw_name = name.replace('_', \"-\");
        let (section_path, _section) = self
            .state
            .find_sections_by_name(&raw_name)
            .into_iter()
            .find(|(_, s)| s.kind == entity_type)?;   // <-- first match wins
        ...
        Some(EntityInfo {
            title,
            description: page.and_then(|p| p.description.clone()),
            url_path: has_content.then(|| format!(\"/{section_path}\")),
        })
    }
}

EntityInfo's url_path is then baked into the rendered diagram (PlantUML C4 macros include the URL as a link), so the per-run choice surfaces both as content and as a hyperlink in the SVG that ships to clients.

Suggested fix

Two changes, both small, both worth doing:

  1. Make resolution stable. Sort the per-name index list by path during construction:

    for path in sections.keys() { ... push(idx) ... }
    for indices in sections_by_name.values_mut() {
        indices.sort_by(|&a, &b| pages[a].path.cmp(&pages[b].path));
    }

    This guarantees identical resolution across process restarts. Equivalently, collect sections.keys() into a Vec and sort it before the for loop.

  2. Warn on ambiguity. In find_sections_by_name or in SiteSnapshot::get_entity, when more than one section matches (name, kind), emit tracing::warn! listing the conflicting paths. Operators currently have no signal that they have a name collision in their docs tree — the bad behavior is silent.

    Optional stronger variant: refuse to resolve and return None with a warning, so the unresolved !include becomes a visible diagram failure rather than a silently-wrong link.

Why it matters

  • C4 / PlantUML diagrams generated from meta includes are a core feature of this docs system. Non-determinism here means "the diagram you committed and reviewed yesterday may render with a different system link today," with no warning and no diff visible in the source tree.
  • The trigger (two sections sharing last-segment name and kind) is unusual but not pathological in large monorepos — "payments," "auth," "search" can recur across domains.
  • Output stability across restarts is a basic property of a deterministic renderer; the only thing breaking it here is HashMap iteration order in an internal index, which the fix corrects in a few lines without changing any public behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions