Skip to content

Proposal: make rw an OKF (Open Knowledge Format) round-trip engine, with Backstage as the discovery/distribution layer #566

Description

@yumike

Summary

Google Cloud published the Open Knowledge Format (OKF) v0.1 in June 2026 — a
vendor-neutral spec for organizational knowledge as plain markdown + YAML
frontmatter, in a directory hierarchy, with index.md navigation, a log.md
history, and ordinary markdown links forming a graph
, shippable as a git repo
or tarball and readable by humans and AI agents alike.

That is, almost line for line, the design rw already implements for
documentation. This issue proposes positioning rw as the serving,
navigation, and human/agent-review engine for OKF bundles
, with Backstage's
Software Catalog as the cross-bundle discovery/distribution registry
— a role
rw is uniquely suited to because its sectionRef = kind:namespace/name is
already a Backstage entity ref.

Why this matters for rw

OKF is independent validation of rw's core thesis (markdown + frontmatter +
links + no build step + agent-readable/writable). The convergence is strong
enough that rw already serves a real OKF bundle with zero loading errors
see the dry-run below. The gaps are small and well-scoped, and the pieces rw is
missing (a producer; cross-bundle discovery) are exactly the pieces Backstage
and the OKF reference agent already provide. The opportunity is to assemble a
full loop almost entirely from parts we own.

Evidence: dry-run of the ga4 OKF bundle through rw's loader

Tracing Google's bundles/ga4 (23 files) through scanner.rssource.rs
MetaFieldssite_state.rs:

  • Every .md loads as a page; nothing 404s or errors. The bundle's root
    index.md satisfies rw's homepage requirement; viz.html is simply ignored
    (not .md/meta.yaml). No meta.yaml exists in the bundle, so none of rw's
    sidecar/virtual-page logic fires.
  • Three non-fatal gaps surfaced:
    1. Lossy frontmatter. MetaFields (no deny_unknown_fields, no catch-all)
      silently drops OKF's resource, tags, and timestamp. rw keeps
      typekind, title, description.
    2. type → section distortion. type: is a serde alias for kind, and
      site_state.rs registers a section for any page with a kind. Since every
      OKF doc carries type, every page becomes its own section (e.g.
      BigQuery Table:default/events_, Reference:default/user_count),
      distorting navigation. kind is freeform/unvalidated, so the space in
      "BigQuery Table" doesn't crash but produces malformed entity refs.
    3. index.md redundancy. OKF index.md files are frontmatter-less link
      lists; rw renders them as content and builds its own sidebar → duplicate
      nav. Cosmetic, not fatal. (log.md, if present, renders as a normal page.)

Proposal

Position rw + Backstage as the two layers OKF leaves open:

Selection layer Solved by Mechanism
Which bundle? (cross-bundle) Backstage Software Catalog Bundle ↔ catalog entity; query by kind/owner/namespace/tag
Which docs in the bundle? (intra-bundle) rw Navigation API + index.md progressive disclosure + rendering
Correcting the bundle (review loop) rw Inline comments + rw comment CLI (agent draft → human review → agent revise)

Distribution reuses the existing path: rw backstage publish → S3 → backend
plugin serves via @rwdocs/core, frontend mounts @rwdocs/viewer.

Work items

1. OKF import (consume bundles faithfully)

  • Stop dropping unknown frontmatter: add resource/tags/timestamp to
    MetaFields (or a #[serde(flatten)] extra map) so they round-trip.
  • Decouple type from section-registration: gate section-registration
    behind structural kinds (e.g. domain/system/service) or add an
    OKF-ingest mode where type is metadata-only. (highest-impact fix)
  • Optionally suppress rw's auto-sidebar when an OKF index.md is present.

2. OKF export (the "export/refine producer")

  • Serialize an rw site to a conformant bundle: emit a type per page
    (default for kind-less pages), timestamp from git lastModified,
    bundle-relative /path.md links from the resolved link graph, index.md
    from the navigation tree, and optionally log.md from git history.
  • Explicitly not building a cold-start data-source producer (BigQuery →
    bundle); that is the OKF reference agent's lane.

3. Backstage discovery/distribution

  • Register a published bundle as a catalog entity (a catalog-info.yaml
    emitter or a Backstage entity provider that ingests published bundles),
    using rw's existing sectionRef = kind:namespace/name as the bridge.

4. Agent query→fetch path (the differentiator)

  • Expose a catalog-query → bundle-fetch flow (e.g. an MCP server or a
    Backstage AI action) so an agent selects the right bundle via the catalog,
    then navigates docs via rw's nav/page APIs. renderSearchDocument()
    already produces clean plain text for the RAG/index path.

Scope / non-goals

  • Not a cold-start producer (data source → bundle). Leave extraction +
    enrichment to OKF's reference agent.
  • Not a replacement for Backstage's catalog; rw is the serving/review engine,
    the catalog is the registry. The architecture also works with any entity
    registry or a standalone bundle index rw could host.

Open questions

  • Does rw rewrite absolute /path.md OKF links (the spec's recommended
    form)? The ga4 bundle uses relative links, so this is unverified — needs a
    test. Relative .md links already resolve.
  • Backstage fits org-owned bundles cleanly (a payments-dataset bundle ↔ the
    payments system entity) but is awkward for external/public bundles (the
    ga4 public dataset isn't "our system"). How do we register external bundles —
    third-party entities, or a lighter registry?
  • Bundle granularity: one catalog entity per bundle, or per top-level section?

References

  • OKF v0.1 SPEC.md (frontmatter fields, index.md/log.md, link semantics)
  • OKF reference tooling: okf/src/reference_agent (enrichment producer +
    static visualizer; BigQuery-only, Python-only)
  • rw internals consulted: rw-meta/src/fields.rs, rw-storage-fs/src/{scanner,source}.rs,
    rw-site/src/site_state.rs, rw-sections, docs/metadata.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions