Summary
Google Cloud published the Open Knowledge Format (OKF) v0.1 in June 2026 — a
vendor-neutral spec for organizational knowledge as plain markdown + YAML
frontmatter, in a directory hierarchy, with index.md navigation, a log.md
history, and ordinary markdown links forming a graph, shippable as a git repo
or tarball and readable by humans and AI agents alike.
That is, almost line for line, the design rw already implements for
documentation. This issue proposes positioning rw as the serving,
navigation, and human/agent-review engine for OKF bundles, with Backstage's
Software Catalog as the cross-bundle discovery/distribution registry — a role
rw is uniquely suited to because its sectionRef = kind:namespace/name is
already a Backstage entity ref.
Why this matters for rw
OKF is independent validation of rw's core thesis (markdown + frontmatter +
links + no build step + agent-readable/writable). The convergence is strong
enough that rw already serves a real OKF bundle with zero loading errors —
see the dry-run below. The gaps are small and well-scoped, and the pieces rw is
missing (a producer; cross-bundle discovery) are exactly the pieces Backstage
and the OKF reference agent already provide. The opportunity is to assemble a
full loop almost entirely from parts we own.
Evidence: dry-run of the ga4 OKF bundle through rw's loader
Tracing Google's bundles/ga4 (23 files) through scanner.rs → source.rs →
MetaFields → site_state.rs:
- Every
.md loads as a page; nothing 404s or errors. The bundle's root
index.md satisfies rw's homepage requirement; viz.html is simply ignored
(not .md/meta.yaml). No meta.yaml exists in the bundle, so none of rw's
sidecar/virtual-page logic fires.
- Three non-fatal gaps surfaced:
- Lossy frontmatter.
MetaFields (no deny_unknown_fields, no catch-all)
silently drops OKF's resource, tags, and timestamp. rw keeps
type→kind, title, description.
type → section distortion. type: is a serde alias for kind, and
site_state.rs registers a section for any page with a kind. Since every
OKF doc carries type, every page becomes its own section (e.g.
BigQuery Table:default/events_, Reference:default/user_count),
distorting navigation. kind is freeform/unvalidated, so the space in
"BigQuery Table" doesn't crash but produces malformed entity refs.
index.md redundancy. OKF index.md files are frontmatter-less link
lists; rw renders them as content and builds its own sidebar → duplicate
nav. Cosmetic, not fatal. (log.md, if present, renders as a normal page.)
Proposal
Position rw + Backstage as the two layers OKF leaves open:
| Selection layer |
Solved by |
Mechanism |
| Which bundle? (cross-bundle) |
Backstage Software Catalog |
Bundle ↔ catalog entity; query by kind/owner/namespace/tag |
| Which docs in the bundle? (intra-bundle) |
rw |
Navigation API + index.md progressive disclosure + rendering |
| Correcting the bundle (review loop) |
rw |
Inline comments + rw comment CLI (agent draft → human review → agent revise) |
Distribution reuses the existing path: rw backstage publish → S3 → backend
plugin serves via @rwdocs/core, frontend mounts @rwdocs/viewer.
Work items
1. OKF import (consume bundles faithfully)
2. OKF export (the "export/refine producer")
3. Backstage discovery/distribution
4. Agent query→fetch path (the differentiator)
Scope / non-goals
- Not a cold-start producer (data source → bundle). Leave extraction +
enrichment to OKF's reference agent.
- Not a replacement for Backstage's catalog; rw is the serving/review engine,
the catalog is the registry. The architecture also works with any entity
registry or a standalone bundle index rw could host.
Open questions
- Does rw rewrite absolute
/path.md OKF links (the spec's recommended
form)? The ga4 bundle uses relative links, so this is unverified — needs a
test. Relative .md links already resolve.
- Backstage fits org-owned bundles cleanly (a payments-dataset bundle ↔ the
payments system entity) but is awkward for external/public bundles (the
ga4 public dataset isn't "our system"). How do we register external bundles —
third-party entities, or a lighter registry?
- Bundle granularity: one catalog entity per bundle, or per top-level section?
References
- OKF v0.1 SPEC.md (frontmatter fields,
index.md/log.md, link semantics)
- OKF reference tooling:
okf/src/reference_agent (enrichment producer +
static visualizer; BigQuery-only, Python-only)
- rw internals consulted:
rw-meta/src/fields.rs, rw-storage-fs/src/{scanner,source}.rs,
rw-site/src/site_state.rs, rw-sections, docs/metadata.md
Summary
Google Cloud published the Open Knowledge Format (OKF) v0.1 in June 2026 — a
vendor-neutral spec for organizational knowledge as plain markdown + YAML
frontmatter, in a directory hierarchy, with
index.mdnavigation, alog.mdhistory, and ordinary markdown links forming a graph, shippable as a git repo
or tarball and readable by humans and AI agents alike.
That is, almost line for line, the design rw already implements for
documentation. This issue proposes positioning rw as the serving,
navigation, and human/agent-review engine for OKF bundles, with Backstage's
Software Catalog as the cross-bundle discovery/distribution registry — a role
rw is uniquely suited to because its
sectionRef = kind:namespace/nameisalready a Backstage entity ref.
Why this matters for rw
OKF is independent validation of rw's core thesis (markdown + frontmatter +
links + no build step + agent-readable/writable). The convergence is strong
enough that rw already serves a real OKF bundle with zero loading errors —
see the dry-run below. The gaps are small and well-scoped, and the pieces rw is
missing (a producer; cross-bundle discovery) are exactly the pieces Backstage
and the OKF reference agent already provide. The opportunity is to assemble a
full loop almost entirely from parts we own.
Evidence: dry-run of the
ga4OKF bundle through rw's loaderTracing Google's
bundles/ga4(23 files) throughscanner.rs→source.rs→MetaFields→site_state.rs:.mdloads as a page; nothing 404s or errors. The bundle's rootindex.mdsatisfies rw's homepage requirement;viz.htmlis simply ignored(not
.md/meta.yaml). Nometa.yamlexists in the bundle, so none of rw'ssidecar/virtual-page logic fires.
MetaFields(nodeny_unknown_fields, no catch-all)silently drops OKF's
resource,tags, andtimestamp. rw keepstype→kind,title,description.type→ section distortion.type:is a serde alias forkind, andsite_state.rsregisters a section for any page with a kind. Since everyOKF doc carries
type, every page becomes its own section (e.g.BigQuery Table:default/events_,Reference:default/user_count),distorting navigation.
kindis freeform/unvalidated, so the space in"BigQuery Table"doesn't crash but produces malformed entity refs.index.mdredundancy. OKFindex.mdfiles are frontmatter-less linklists; rw renders them as content and builds its own sidebar → duplicate
nav. Cosmetic, not fatal. (
log.md, if present, renders as a normal page.)Proposal
Position rw + Backstage as the two layers OKF leaves open:
index.mdprogressive disclosure + renderingrw commentCLI (agent draft → human review → agent revise)Distribution reuses the existing path:
rw backstage publish→ S3 → backendplugin serves via
@rwdocs/core, frontend mounts@rwdocs/viewer.Work items
1. OKF import (consume bundles faithfully)
resource/tags/timestamptoMetaFields(or a#[serde(flatten)]extra map) so they round-trip.typefrom section-registration: gate section-registrationbehind structural kinds (e.g.
domain/system/service) or add anOKF-ingest mode where
typeis metadata-only. (highest-impact fix)index.mdis present.2. OKF export (the "export/refine producer")
typeper page(default for kind-less pages),
timestampfrom gitlastModified,bundle-relative
/path.mdlinks from the resolved link graph,index.mdfrom the navigation tree, and optionally
log.mdfrom git history.bundle); that is the OKF reference agent's lane.
3. Backstage discovery/distribution
catalog-info.yamlemitter or a Backstage entity provider that ingests published bundles),
using rw's existing
sectionRef = kind:namespace/nameas the bridge.4. Agent query→fetch path (the differentiator)
Backstage AI action) so an agent selects the right bundle via the catalog,
then navigates docs via rw's nav/page APIs.
renderSearchDocument()already produces clean plain text for the RAG/index path.
Scope / non-goals
enrichment to OKF's reference agent.
the catalog is the registry. The architecture also works with any entity
registry or a standalone bundle index rw could host.
Open questions
/path.mdOKF links (the spec's recommendedform)? The
ga4bundle uses relative links, so this is unverified — needs atest. Relative
.mdlinks already resolve.payments system entity) but is awkward for external/public bundles (the
ga4public dataset isn't "our system"). How do we register external bundles —third-party entities, or a lighter registry?
References
index.md/log.md, link semantics)okf/src/reference_agent(enrichment producer +static visualizer; BigQuery-only, Python-only)
rw-meta/src/fields.rs,rw-storage-fs/src/{scanner,source}.rs,rw-site/src/site_state.rs,rw-sections,docs/metadata.md