Context
We recently built a markdown + YAML-frontmatter knowledge layer for a different domain (distilled engineering decisions rather than data-catalog metadata) and evaluated OKF closely as an alternative. We independently converged on almost exactly the same substrate — markdown files, frontmatter as the queryable surface, git for history, grep/cat for access, markdown links as untyped edges, permissive consumption. That convergence from a completely different starting point is, to us, strong evidence the core format is right.
This issue makes one foundational suggestion and three additive ones:
- Foundational (§1): don't make the file path the concept's identity. This is the one place we think OKF's current model actively works against the maintained-bundle use case, and the fix is a model change, not an optional field — so we argue it directly rather than dressing it up as backward-compatible.
- Additive (§2–§4): optional frontmatter/convention additions that need no breaking change and that consumers can ignore — a relationship index, a per-concept rationale log, and two small fields.
One honest framing up front: our corpus is machine-consumed — an agent retrieves and assembles across concept pages with grep/glob/read; nobody browses a tree. With that consumer, folder nesting buys nothing (recursive grep is flat anyway), so we go flat and treat the filename as identity. The one question we'd genuinely value your position on: is OKF's primary consumer a human browsing a catalog, or tooling retrieving across the bundle? For tooling, the flat + filename-identity argument below applies at any scale you'd keep in one bundle. If you deliberately optimize for human browsing, that — not identity — is the real reason to keep folders, and it's worth saying so explicitly. Happy to split this into separate issues if that's easier to triage.
| # |
Problem |
Suggestion |
| 1 (foundational) |
Concept identity is the file path (§2); re-categorizing a concept changes its identity and breaks every inbound link |
Stop treating the directory path as identity — the filename already is one. Go flat; express hierarchy with a scope: frontmatter field (split into multiple bundles if a corpus outgrows one) |
| 2 (additive) |
Relationships live only in body prose (§5.3); reverse/impact queries require loading every body or a graph pass |
Optional flat touches: frontmatter list (relationship kind stays in prose) |
| 3 (additive) |
Overwrite (timestamp:, §4.1) + directory CRUD log.md (§7) lose the rationale for a change |
Optional per-concept append-only rationale section |
1 — Foundational: don't make the path the identity
The problem
§2 defines Concept ID as the file path with .md stripped — so a concept's identity is its location. §5.1 confirms the consequence: bundle-relative links are stable only "when documents are moved within their subdirectory," i.e. explicitly not across subdirectories.
This means the single most natural operation on a maturing knowledge base — re-categorizing a concept once you understand it better (promoting a root concept into a domain, moving it to a better-fitting parent) — changes its identity and silently breaks every inbound link. And because OKF recommends absolute links precisely for their stability, the recommended style is the one most exposed. This taxes exactly the activity OKF's git-native, PR-reviewed model is otherwise great at: refactoring knowledge over time.
Fusing identity to location isn't an incidental detail — it's the one place the model actively fights the maintained-bundle use case. The target is the identity=path coupling specifically, not hierarchy: hierarchy and stable identity coexist in plenty of systems (git addresses blobs by content hash, not path; CMS permalinks are independent of a page's folder). And for a machine consumer, folders earn nothing on their own — recursive grep/glob (rg, **/*.md) traverse nesting transparently — so the clean fix is to drop them and make identity location-independent.
The fix — the filename is the identity; hierarchy is scope:
You don't need a new field. Make a concept's identity its filename (the basename, not the directory path), keep every concept in one flat directory, and express hierarchy with a scope: frontmatter field instead of nesting:
---
scope: metrics # the parent concept, referenced by its filename; metrics is itself a concept, not a folder
---
The file is event-count.md, so its identity is event-count — stable no matter what. scope: names the concept's single, direct parent (here the concept metrics.md), which is itself a first-class concept with its own page, not a dumb folder. Full ancestry is derived by following scope: upward; nothing bakes the path into a string. That makes the operations that break OKF today trivial:
- Promote a sub-concept to top-level: drop
scope:. Filename (identity) and inbound links unchanged.
- Re-parent (a leaf or a whole subtree): edit that one concept's
scope:; descendants follow automatically, since ancestry is derived, not stored per-node.
Progressive disclosure / index.md can be generated from scope: instead of mirroring a tree.
If a corpus ever genuinely outgrows a single flat namespace, that's the signal to split it into multiple bundles — OKF already treats the bundle as the unit of distribution (§2) — not to reintroduce nesting. (It mirrors a "split, don't grow" rule we apply at the concept level; where exactly that threshold sits is a fair thing to debate.)
2 — Additive: flat frontmatter relationship list (cheap reverse + impact queries)
Problem
Per §5.3, relationships are expressed only as inline body links. Answering "what references X?" or "what would a change to X affect?" therefore requires loading every concept body (or running a graph pass — the bundled viewer reverses the link graph at consumption time). For a grep-only consumer, a CI check, or a large bundle, that's costly, and none of it is answerable from the frontmatter surface OKF otherwise positions as the queryable layer.
Proposed addition
An optional frontmatter touches: field — a flat list of the concept identities a concept relates to:
---
touches: [customers, sales]
---
The relationship kind stays in prose — a # Joins / # Integration Points section with one sentence per neighbour — exactly as §5.3 already prescribes. touches: is just the denormalized, greppable mirror of that prose:
- The prose section is the source of truth;
touches: is a convenience index, scannable by frontmatter-only tooling (rg 'touches:.*customers') without parsing bodies.
- Non-transitive, and consumers cap how many neighbours they follow per task (we use 5–7) so blast-radius queries stay bounded.
- Entirely optional: bundles without
touches: behave exactly as today (consumers reverse the body-link graph).
- Droppable without loss: if it ever drifts from the prose, delete it and grep the prose section instead — a small recall cost, zero migration.
Rationale
This makes reverse/impact analysis answerable from the frontmatter index alone — no graph engine, no full-body scan — which matters as bundles grow and for the many lightweight consumers OKF wants to enable. Keeping the relationship kind in prose rather than frontmatter preserves OKF's untyped-link philosophy (§5.3); touches: only adds a fast path for "what is connected to X," not a typed-edge model.
Caveat (worth documenting if adopted)
A denormalized list can drift from the prose. Position touches: as tool-maintained / regenerable from the prose section and explicitly droppable — an optimization, never a second source of truth.
3 — Additive: per-concept rationale / history (the "why", not just the "what changed")
Problem
OKF's temporal affordances are timestamp: (a single last-modified scalar that is overwritten, §4.1) and the optional directory-level log.md (§7), which records CRUD-style events — "Added table X", "Created playbook Y". log.md records that a file changed; it doesn't record how the understanding changed or why.
For concepts bound to a regenerable resource: (a BigQuery table), overwrite-on-change is exactly right — the truth lives in the source and the doc is a projection. But §4.4 explicitly blesses concepts that are not bound to a resource — Playbooks, Metrics, business processes — i.e. curated human judgment whose value is the reasoning. For those, the current-state body should still be overwritten to reflect current understanding — that part is correct. The gap is that the overwrite is silent: the prior understanding and the reason it was abandoned vanish with it. The only trace left is the git diff, which is line-level noise, not curated rationale — and a future maintainer is liable to relitigate a question that was already settled.
Proposed addition
A conventional (no schema change) per-concept history section — same spirit as the conventional # Schema / # Examples / # Citations headings already defined in §4.2:
# Decision Log
## 2026-06-09 — session claim becomes authoritative
Multi-domain customers broke host-based resolution; resolution now prefers the
signed session claim. Supersedes the prior host-subdomain approach.
## 2026-03-02 — no tenant-existence leak
Unresolvable orgs now return "not found" rather than "forbidden" after a pentest
finding that "forbidden" let attackers enumerate valid tenants.
- Append-only and immutable, dated entries recording the why. Prior entries are never edited, reordered, or deleted; a reversed decision is recorded as a new entry. A claim or invariant that becomes false is struck through (
~~…~~) in place rather than removed, so the record still shows what was once believed and why it was dropped. (OKF may of course relax this to a looser convention.)
- The current-state body (overview, schema, examples) is still overwritten freely to reflect current understanding — only this log is append-only. Together they give a live snapshot plus the trail of how it got there.
- Distinct from the directory
log.md: that's operational ("a file was added"); this is conceptual ("our understanding changed, and here's why").
- Purely a conventional heading — consumers ignore it like any other body content, so it's zero-cost to those who don't want it.
Rationale
A catalog answers "what is this." Curated, non-resource-bound concepts also need "why is it this, and what did we believe before?" Preserving that rationale trail lets readers extend a concept without re-deriving or contradicting settled decisions — and it directly serves the goal in §9 of OKF "remaining useful as bundles grow, get refactored, and are partially generated by agents." It's the one piece of a maintained bundle that can't be regenerated from the underlying source.
4 — Additive: two smaller fields (aliases, status)
Two more frontmatter fields earn their weight in our system and would slot cleanly into OKF's optional-field model:
aliases: — a flat list of alternate names for the concept, for synonym findability. A reader who greps "org lookup" should still find the concept whose canonical name is org-resolution; without aliases the page is invisible to natural phrasing.
aliases: ["organization lookup", "org matching", "tenant resolution"]
status: — active | deprecated, a cheap validity filter so consumers can exclude retired concepts from results. It pairs with not deleting: a deprecated concept is marked (and optionally moved to an archive area) rather than removed, so its history and inbound links survive.
Both are optional and ignorable — consumers that don't recognise them treat them as arbitrary extra keys (§4.1).
Summary
§1 is a model change — we think path-as-identity is the one place OKF's current design works against maintained bundles, and we'd argue for fixing it rather than papering over it with an optional field. The fix needs no new field: the filename is the identity, the layout goes flat, and scope: carries hierarchy — splitting into multiple bundles if a corpus ever truly outgrows one.
§2–§4 are purely additive — optional fields/conventions; existing bundles stay conformant (§9), consumers that ignore them behave exactly as today, and they fit the additive minor-version policy (§11).
All of it is aimed at the maintained, evolving end of OKF usage — bundles humans curate and refactor over time, where identity stability, cheap impact analysis, and preserved rationale start to matter.
Thanks for publishing OKF as an open spec — the convergence we hit independently made us more confident in the approach, and these are offered in that spirit. Glad to open PRs against SPEC.md for any of them; §1 is the one we'd most like to discuss first.
Context
We recently built a markdown + YAML-frontmatter knowledge layer for a different domain (distilled engineering decisions rather than data-catalog metadata) and evaluated OKF closely as an alternative. We independently converged on almost exactly the same substrate — markdown files, frontmatter as the queryable surface, git for history, grep/
catfor access, markdown links as untyped edges, permissive consumption. That convergence from a completely different starting point is, to us, strong evidence the core format is right.This issue makes one foundational suggestion and three additive ones:
One honest framing up front: our corpus is machine-consumed — an agent retrieves and assembles across concept pages with grep/glob/read; nobody browses a tree. With that consumer, folder nesting buys nothing (recursive grep is flat anyway), so we go flat and treat the filename as identity. The one question we'd genuinely value your position on: is OKF's primary consumer a human browsing a catalog, or tooling retrieving across the bundle? For tooling, the flat + filename-identity argument below applies at any scale you'd keep in one bundle. If you deliberately optimize for human browsing, that — not identity — is the real reason to keep folders, and it's worth saying so explicitly. Happy to split this into separate issues if that's easier to triage.
scope:frontmatter field (split into multiple bundles if a corpus outgrows one)touches:frontmatter list (relationship kind stays in prose)timestamp:, §4.1) + directory CRUDlog.md(§7) lose the rationale for a change1 — Foundational: don't make the path the identity
The problem
§2 defines Concept ID as the file path with
.mdstripped — so a concept's identity is its location. §5.1 confirms the consequence: bundle-relative links are stable only "when documents are moved within their subdirectory," i.e. explicitly not across subdirectories.This means the single most natural operation on a maturing knowledge base — re-categorizing a concept once you understand it better (promoting a root concept into a domain, moving it to a better-fitting parent) — changes its identity and silently breaks every inbound link. And because OKF recommends absolute links precisely for their stability, the recommended style is the one most exposed. This taxes exactly the activity OKF's git-native, PR-reviewed model is otherwise great at: refactoring knowledge over time.
Fusing identity to location isn't an incidental detail — it's the one place the model actively fights the maintained-bundle use case. The target is the identity=path coupling specifically, not hierarchy: hierarchy and stable identity coexist in plenty of systems (git addresses blobs by content hash, not path; CMS permalinks are independent of a page's folder). And for a machine consumer, folders earn nothing on their own — recursive
grep/glob(rg,**/*.md) traverse nesting transparently — so the clean fix is to drop them and make identity location-independent.The fix — the filename is the identity; hierarchy is
scope:You don't need a new field. Make a concept's identity its filename (the basename, not the directory path), keep every concept in one flat directory, and express hierarchy with a
scope:frontmatter field instead of nesting:The file is
event-count.md, so its identity isevent-count— stable no matter what.scope:names the concept's single, direct parent (here the conceptmetrics.md), which is itself a first-class concept with its own page, not a dumb folder. Full ancestry is derived by followingscope:upward; nothing bakes the path into a string. That makes the operations that break OKF today trivial:scope:. Filename (identity) and inbound links unchanged.scope:; descendants follow automatically, since ancestry is derived, not stored per-node.Progressive disclosure /
index.mdcan be generated fromscope:instead of mirroring a tree.If a corpus ever genuinely outgrows a single flat namespace, that's the signal to split it into multiple bundles — OKF already treats the bundle as the unit of distribution (§2) — not to reintroduce nesting. (It mirrors a "split, don't grow" rule we apply at the concept level; where exactly that threshold sits is a fair thing to debate.)
2 — Additive: flat frontmatter relationship list (cheap reverse + impact queries)
Problem
Per §5.3, relationships are expressed only as inline body links. Answering "what references X?" or "what would a change to X affect?" therefore requires loading every concept body (or running a graph pass — the bundled viewer reverses the link graph at consumption time). For a grep-only consumer, a CI check, or a large bundle, that's costly, and none of it is answerable from the frontmatter surface OKF otherwise positions as the queryable layer.
Proposed addition
An optional frontmatter
touches:field — a flat list of the concept identities a concept relates to:The relationship kind stays in prose — a
# Joins/# Integration Pointssection with one sentence per neighbour — exactly as §5.3 already prescribes.touches:is just the denormalized, greppable mirror of that prose:touches:is a convenience index, scannable by frontmatter-only tooling (rg 'touches:.*customers') without parsing bodies.touches:behave exactly as today (consumers reverse the body-link graph).Rationale
This makes reverse/impact analysis answerable from the frontmatter index alone — no graph engine, no full-body scan — which matters as bundles grow and for the many lightweight consumers OKF wants to enable. Keeping the relationship kind in prose rather than frontmatter preserves OKF's untyped-link philosophy (§5.3);
touches:only adds a fast path for "what is connected to X," not a typed-edge model.Caveat (worth documenting if adopted)
A denormalized list can drift from the prose. Position
touches:as tool-maintained / regenerable from the prose section and explicitly droppable — an optimization, never a second source of truth.3 — Additive: per-concept rationale / history (the "why", not just the "what changed")
Problem
OKF's temporal affordances are
timestamp:(a single last-modified scalar that is overwritten, §4.1) and the optional directory-levellog.md(§7), which records CRUD-style events — "Added table X", "Created playbook Y".log.mdrecords that a file changed; it doesn't record how the understanding changed or why.For concepts bound to a regenerable
resource:(a BigQuery table), overwrite-on-change is exactly right — the truth lives in the source and the doc is a projection. But §4.4 explicitly blesses concepts that are not bound to a resource — Playbooks, Metrics, business processes — i.e. curated human judgment whose value is the reasoning. For those, the current-state body should still be overwritten to reflect current understanding — that part is correct. The gap is that the overwrite is silent: the prior understanding and the reason it was abandoned vanish with it. The only trace left is the git diff, which is line-level noise, not curated rationale — and a future maintainer is liable to relitigate a question that was already settled.Proposed addition
A conventional (no schema change) per-concept history section — same spirit as the conventional
# Schema/# Examples/# Citationsheadings already defined in §4.2:~~…~~) in place rather than removed, so the record still shows what was once believed and why it was dropped. (OKF may of course relax this to a looser convention.)log.md: that's operational ("a file was added"); this is conceptual ("our understanding changed, and here's why").Rationale
A catalog answers "what is this." Curated, non-resource-bound concepts also need "why is it this, and what did we believe before?" Preserving that rationale trail lets readers extend a concept without re-deriving or contradicting settled decisions — and it directly serves the goal in §9 of OKF "remaining useful as bundles grow, get refactored, and are partially generated by agents." It's the one piece of a maintained bundle that can't be regenerated from the underlying source.
4 — Additive: two smaller fields (
aliases,status)Two more frontmatter fields earn their weight in our system and would slot cleanly into OKF's optional-field model:
aliases:— a flat list of alternate names for the concept, for synonym findability. A reader who greps "org lookup" should still find the concept whose canonical name isorg-resolution; without aliases the page is invisible to natural phrasing.status:—active | deprecated, a cheap validity filter so consumers can exclude retired concepts from results. It pairs with not deleting: a deprecated concept is marked (and optionally moved to an archive area) rather than removed, so its history and inbound links survive.Both are optional and ignorable — consumers that don't recognise them treat them as arbitrary extra keys (§4.1).
Summary
§1 is a model change — we think path-as-identity is the one place OKF's current design works against maintained bundles, and we'd argue for fixing it rather than papering over it with an optional field. The fix needs no new field: the filename is the identity, the layout goes flat, and
scope:carries hierarchy — splitting into multiple bundles if a corpus ever truly outgrows one.§2–§4 are purely additive — optional fields/conventions; existing bundles stay conformant (§9), consumers that ignore them behave exactly as today, and they fit the additive minor-version policy (§11).
All of it is aimed at the maintained, evolving end of OKF usage — bundles humans curate and refactor over time, where identity stability, cheap impact analysis, and preserved rationale start to matter.
Thanks for publishing OKF as an open spec — the convergence we hit independently made us more confident in the approach, and these are offered in that spirit. Glad to open PRs against
SPEC.mdfor any of them; §1 is the one we'd most like to discuss first.