Skip to content

kv-cache-concept-page#206

Merged
AndreasAbdi merged 3 commits into
mainfrom
kv-cache-concept-page
Jun 22, 2026
Merged

kv-cache-concept-page#206
AndreasAbdi merged 3 commits into
mainfrom
kv-cache-concept-page

Conversation

@AndreasAbdi

Copy link
Copy Markdown
Contributor

{
"project": "Model Atlas — KV Cache Canonical Concept Page",
"branchName": "kv-cache-concept-page",
"description": "Publish one canonical English kv-cache concept page, backed by the existing registry record and localized messages, so readers can understand why caching keys and values matters for serving speed and move cleanly between nearby serving and attention docs.",
"context": {
"customerAsk": "Add the canonical English concept page for kv-cache, because the site has the glossary term and related attention pages but still lacks the broad explainer that tells readers why caching keys and values matters for serving speed. Keep this as one small, mergeable page slice on top of current main. Scope: create the canonical concept page plus messages/en.json and any needed assets.json, backed by the existing published concept.kv-cache registry record; classify it correctly per the docs and data-model guidance; wire aliases, tags, and related-doc connections so it can be found from prefill, decode, prefill-decode-split, autoregressive-generation, attention, multi-query-attention, grouped-query-attention, and sliding-window-attention; and update only the focused validation/tests needed for touched content and discovery integrity. The explanation should define the KV cache in layperson terms, explain why it removes repeated work during generation, and clarify how it changes latency and memory tradeoffs without collapsing into kernel-level detail. Keep the slice English-only and avoid unrelated locale or tokenizer churn. Acceptance criteria: the kv-cache concept page renders on current main, is backed by the existing registry record and discovery metadata, and focused touched checks pass.",
"problem": "The repository already ships the published concept.kv-cache registry record, a glossary entry for KV cache, and nearby attention and serving pages, but it does not ship the canonical concept page that explains KV cache as the broad serving concept behind prompt reuse during generation. Readers therefore have a gap between the short glossary definition and the more specific pages for prefill, decode, prefill-decode-split, attention, multi-query-attention, grouped-query-attention, and sliding-window-attention, with no plain-language bridge that explains what is being saved, why repeated attention work can be skipped, and why the cache improves latency while increasing memory pressure.",
"solution": "Create a canonical concept page at /docs/concepts/kv-cache using the standard concept-page structure, localized English messages, and the existing concept.kv-cache registry record. Keep the copy simple and technical-layperson-friendly, connect the page to nearby glossary, generation-stage, and attention pages through focused discovery metadata and links, and add only the narrow validation needed to prove route, registry, message, and nearby discovery integrity."
},
"acceptanceCriteria": [
"A canonical docs page exists for kv-cache under the concepts docs tree, binds to registryId: concept.kv-cache, and renders in the standard docs shell.",
"The page uses colocated messages/en.json and any required assets.json, with reader-facing copy resolved through message keys rather than hard-coded prose in page.mdx.",
"The opening summary and primary sections explain, in simple language, what keys and values are being cached, why that removes repeated work during generation, and how the cache trades lower latency for higher memory usage.",
"Readers can move cleanly between the new KV-cache concept page and prefill, decode, prefill-decode-split, autoregressive-generation, attention, multi-query-attention, grouped-query-attention, and sliding-window-attention through the touched related-doc, alias, tag, or focused inline-link surfaces.",
"The implementation stays English-only and does not reopen unrelated locale, tokenizer, or broad taxonomy work beyond what is required for this page to land cleanly.",
"Focused validation proves the canonical KV-cache concept page resolves against the existing published registry record and that the touched discovery surfaces remain valid.",
"Quality gate: typecheck, lint, and targeted tests pass."
],
"userStories": [
{
"id": "kv-cache-concept-page-001",
"title": "Publish the canonical KV-cache concept page",
"description": "As a reader trying to understand why generation speed depends on cached attention state, I want one canonical KV-cache page so I can learn the broad idea before diving into serving stages or attention variants.",
"acceptanceCriteria": [
"A canonical concept page exists at /docs/concepts/kv-cache with frontmatter that binds to concept.kv-cache, plus colocated messages/en.json and any required local assets.json.",
"The page follows the concept template pattern and keeps page.mdx structural, with reader-facing text resolved through messages/en.json.",
"The page opens with one folded openingSummary and explains in plain language that earlier tokens produce keys and values that later decode steps can reuse instead of recomputing the whole prefix.",
"The page explains that the cache lowers repeated attention work during autoregressive generation while growing memory usage as sequence length and layer count increase.",
"The page avoids kernel-level implementation detail and remains understandable in isolation before narrowing into serving-stage or attention-variant examples.",
"The page renders in the standard docs shell.",
"Typecheck passes",
"Verify in browser using the Browser plugin"
],
"priority": 1,
"passes": true,
"notes": ""
},
{
"id": "kv-cache-concept-page-002",
"title": "Connect the KV-cache bridge page to nearby serving and attention docs",
"description": "As a reader exploring serving behavior, I want the canonical KV-cache page connected to nearby generation-stage and attention pages so I can move between the broad cache concept and concrete serving mechanisms without dead ends.",
"acceptanceCriteria": [
"The KV-cache concept page exposes clear navigation to prefill, decode, prefill-decode-split, autoregressive-generation, attention, multi-query-attention, grouped-query-attention, and sliding-window-attention through registry-backed related docs and any required focused inline links.",
"Touched discovery metadata keeps aliases and tags aligned with concept.kv-cache so representative queries such as kv cache, key-value cache, attention cache, and cache for decoding resolve to the canonical concept surface appropriately.",
"The page clearly explains the practical tradeoff that lower repeated compute and lower latency come at the cost of more live serving memory, without drifting into kernel-level or benchmark-heavy detail.",
"Any supporting copy or relationship changes outside the new page remain narrowly scoped to reader movement to or from the KV-cache concept page.",
"Typecheck passes",
"Tests pass",
"Verify in browser using the Browser plugin"
],
"priority": 2,
"passes": true,
"notes": ""
},
{
"id": "kv-cache-concept-page-003",
"title": "Add focused validation for page contract and KV-cache discovery",
"description": "As a maintainer, I want targeted automated proof for the KV-cache concept slice so registry, message, route, and nearby discovery regressions are caught without unrelated test churn.",
"acceptanceCriteria": [
"Automated validation or tests confirm the canonical KV-cache concept page resolves against the existing published concept.kv-cache registry record instead of creating a duplicate concept record.",
"Automated validation or tests cover the touched discovery surface for the page, such as route resolution, English message loading, and at least one related-doc or search-oriented expectation for the canonical KV-cache concept route.",
"Coverage remains focused on observable behavior for this page slice and does not expand into unrelated locale manifests, tokenizer pages, or broad inventory assertions.",
"Typecheck passes",
"Tests pass"
],
"priority": 3,
"passes": true,
"notes": ""
}
]
}

@AndreasAbdi

Copy link
Copy Markdown
Contributor Author

BLOCKING: GitHub reports this PR as mergeable: CONFLICTING with mergeStateStatus: DIRTY, so it cannot be merged as-is. Please rebase onto main, resolve the conflicts, rerun the required checks, and push an updated head. The conflicts include touched review-scope files such as src/content/registry/concepts/kv-cache.json, src/lib/content/attention-module-page.test.ts, src/lib/content/decode-glossary.test.ts, src/lib/content/glossary-architecture-index.test.ts, src/lib/content/kv-cache-glossary.test.ts, src/lib/content/phase-4-japanese-attention-variant-proof-set.test.tsx, src/lib/content/prefill-decode-split-glossary.test.ts, src/lib/content/prefill-glossary.test.ts, src/lib/content/published-docs-registry-ids.ts, src/lib/search/rerank-search-results.ts, and src/tests/content/attention-tag-landing.test.ts.

Quality checks:

  • PASS: make test
  • PASS: bun run build
  • PASS: Browser QA on http://127.0.0.1:3456/docs/concepts/kv-cache and http://127.0.0.1:3456/docs/glossary/prefill
  • NOTE: gh pr checks and statusCheckRollup reported no live checks for this head.
  • NOTE: docs/internal/processes/manual-qa.md was not present at the path named in the review instructions, so I used direct browser verification against a local production build instead.

Project acceptance criteria:

  • PASS: A canonical docs page exists for kv-cache under the concepts docs tree, binds to registryId: concept.kv-cache, and renders in the standard docs shell. The new /docs/concepts/kv-cache route renders correctly and binds to concept.kv-cache.
  • PASS: The page uses colocated messages/en.json and any required assets.json, with reader-facing copy resolved through message keys rather than hard-coded prose in page.mdx. page.mdx stays structural and the prose lives in messages/en.json; assets.json is present.
  • PASS: The opening summary and primary sections explain, in simple language, what keys and values are being cached, why that removes repeated work during generation, and how the cache trades lower latency for higher memory usage. The rendered copy covers cache contents, repeated-work avoidance, and the latency/memory tradeoff in plain language.
  • PASS: Readers can move cleanly between the new KV-cache concept page and prefill, decode, prefill-decode-split, autoregressive-generation, attention, multi-query-attention, grouped-query-attention, and sliding-window-attention through the touched related-doc, alias, tag, or focused inline-link surfaces. The new page exposes those routes through inline links and related docs, and Prefill now links back to /docs/concepts/kv-cache.
  • PASS: The implementation stays English-only and does not reopen unrelated locale, tokenizer, or broad taxonomy work beyond what is required for this page to land cleanly. The change is English-only content plus the minimum route/search/test updates needed for canonicalization.
  • PASS: Focused validation proves the canonical KV-cache concept page resolves against the existing published registry record and that the touched discovery surfaces remain valid. src/lib/content/kv-cache-concept.test.ts covers registry binding, render behavior, and search/discovery behavior; surrounding touched tests also pass.
  • PASS: Quality gate: typecheck, lint, and targeted tests pass. make test passed; the branch progress notes also indicate bun run typecheck and bun run lint were run on this head.

Behavioral assertion check for passes:true stories:

  • PASS: kv-cache-concept-page-001 includes observable behavior criteria such as route render and browser verification.
  • PASS: kv-cache-concept-page-002 includes observable behavior criteria such as navigation/discovery behavior and browser verification.
  • PASS: kv-cache-concept-page-003 includes observable behavior criteria such as route resolution and search-oriented expectations, not only structural checks.

Docs writing standards:

  • PASS: The page opens with one folded summary and no duplicate title chrome.
  • PASS: The page is understandable in isolation and does not define the topic only through one architecture slot.
  • PASS: The opening summary and first sections explain why the topic matters in plain language.
  • PASS: The title and first mentions use full names before acronyms or shorthand.
  • PASS: Narrative sections have distinct jobs and do not repeat adjacent sections.
  • PASS: Math sections keep symbol-only definitions directly under equations and avoid concept rows such as projections or grouping mechanics. No math section was added, so this remains conformant.
  • PASS: Customer-facing copy contains no phase, process, or authoring meta language.
  • PASS: Baseline templates and rendered copy contain no reader-shortcut callouts.
  • PASS: The page follows the companion quality checklist in docs-quality-standards.md.

Docs quality standards:

  • PASS: One folded summary appears at the top with no duplicate title chrome.
  • PASS: The page works for a first-time reader without requiring adjacent pages.
  • PASS: The first sections explain both the concept and its value in plain language.
  • PASS: Titles and first mentions expand full names before acronyms or shorthand.
  • PASS: Customer-facing message files contain no phase, process, or meta language.
  • PASS: Math sections carry symbol-only definitions directly under equations. Not applicable here and not violated.
  • PASS: Graphs, tables, and captions follow graphing standards. No new graph or chart surface was added.
  • PASS: Narrative sections stay scannable, each paragraph advances one idea, and each section contributes something new.

General website standards:

  • PASS: The change respects architecture and dependency boundaries. The page uses existing docs components and registry/search infrastructure instead of new ad hoc plumbing.
  • PASS: Data flow and ownership are clear. Registry metadata remains the source of truth for aliases/tags/related ids.
  • PASS: Shared UI patterns are reused where appropriate. The page follows the established concept-page template and shared components.
  • PASS: Loading, empty, error, and success states are intentional. No new custom data-backed interactive surface was introduced.
  • PASS: Accessibility, responsive behavior, localization, and browser support were considered. The page renders inside the existing docs shell; manual browser QA and existing automated tests passed.
  • PASS: Test evidence matches the risk and user impact of the change. The branch includes focused behavior tests plus a full make test pass.

Graphing standards:

  • PASS: Not applicable. This PR does not add or modify a graph/chart surface.

Review rules outcome:

  • PASS: Correctness first. I did not find a content, routing, or search-ranking defect in the implementation itself.
  • PASS: Architecture and dependency fit. The change stays within existing docs, registry, and search patterns.
  • PASS: Readability and maintainability. The content/page split and the focused reranker change are understandable.
  • PASS: Appropriate tests and quality evidence. Focused tests, full make test, build, and browser QA all passed.
  • BLOCKING: The PR still requires a rebase and conflict resolution before it is mergeable.

Once the branch is rebased cleanly and pushed, this is ready for a short follow-up review rather than a redesign.

@AndreasAbdi AndreasAbdi force-pushed the kv-cache-concept-page branch from 0b040f2 to cccb60f Compare June 22, 2026 04:56
@AndreasAbdi

Copy link
Copy Markdown
Contributor Author

Addressed the blocking mergeability feedback from 2026-06-22 04:29 UTC.

What changed:

  • Rebasing outcome: force-updated kv-cache-concept-page onto current main, so the PR head is now cccb60faf94c10856ef61dbd02d81212c0ba1e87 and GitHub now reports mergeable: MERGEABLE instead of CONFLICTING.
  • Shared registry-id fix: updated src/lib/content/published-docs-registry-ids.ts so concept.kv-cache resolves deterministically to the canonical concept entry when both /docs/glossary/kv-cache and /docs/concepts/kv-cache are published for the same registry record; unsupported duplicate combinations still fail explicitly.
  • Review-scope expectation updates: refreshed the touched registry-driven and route-render tests to expect /docs/concepts/kv-cache where shared concept.kv-cache lookups should now land, including RegistryComparisonTable, batching, continuous batching, inference engine, memory, speculative decoding, reconciliation, and architecture-index coverage.
  • Focused validation follow-up: added a direct assertion in src/lib/content/published-docs-registry-ids.test.ts for getPublishedDocsEntryByRegistryId("concept.kv-cache") and getPublishedDocsHrefForRecord(...) resolving to /docs/concepts/kv-cache.

Validation run on this updated head:

  • PASS: make test
  • PASS: bun run build
  • PASS: bun run typecheck
  • PASS: bun run lint

PR state after push:

  • Diff still contains the canonical KV-cache page work plus the mergeability follow-up files above.
  • GitHub Actions CI started for this head at 2026-06-22 04:56 UTC and is currently in progress, so I am leaving the PR on the normal continue path until those checks finish.

@AndreasAbdi

Copy link
Copy Markdown
Contributor Author

Superseding prior blocking mergeability feedback: the branch is now MERGEABLE, local validation passed, required CI checks on head cccb60faf94c10856ef61dbd02d81212c0ba1e87 are terminal and green, and I do not have remaining blocking issues.

Quality checks:

  • PASS: make test locally (2479 pass, 0 fail)
  • PASS: local production bun run build
  • PASS: required GitHub checks are green (lint, typecheck, test, test-verify-contract, coverage, test-build-contract, build-export, test-integration, validate-data, linkcheck, ci)
  • PASS: browser QA on http://127.0.0.1:3567/docs/concepts/kv-cache and http://127.0.0.1:3567/docs/glossary/prefill
  • NOTE: the review instruction path docs/internal/processes/manual-qa.md is not present in this worktree, so I verified directly against the local production build instead.

Project acceptance criteria:

  • PASS: A canonical docs page exists for kv-cache under the concepts docs tree, binds to registryId: concept.kv-cache, and renders in the standard docs shell. The new route exists at /docs/concepts/kv-cache, binds to concept.kv-cache, and rendered correctly in browser QA.
  • PASS: The page uses colocated messages/en.json and any required assets.json, with reader-facing copy resolved through message keys rather than hard-coded prose in page.mdx. page.mdx stays structural; the reader-facing copy is in messages/en.json; assets.json is present.
  • PASS: The opening summary and primary sections explain, in simple language, what keys and values are being cached, why that removes repeated work during generation, and how the cache trades lower latency for higher memory usage. The route explains saved keys/values, repeated-work avoidance, and the latency-memory tradeoff in plain language.
  • PASS: Readers can move cleanly between the new KV-cache concept page and prefill, decode, prefill-decode-split, autoregressive-generation, attention, multi-query-attention, grouped-query-attention, and sliding-window-attention through the touched related-doc, alias, tag, or focused inline-link surfaces. The canonical page exposes all of those routes, and touched glossary pages now point back to /docs/concepts/kv-cache.
  • PASS: The implementation stays English-only and does not reopen unrelated locale, tokenizer, or broad taxonomy work beyond what is required for this page to land cleanly. The slice is confined to the English concept page plus targeted registry/search/test updates needed for canonical routing and discovery.
  • PASS: Focused validation proves the canonical KV-cache concept page resolves against the existing published registry record and that the touched discovery surfaces remain valid. The new and updated tests cover registry-id lookup, canonical href resolution, render behavior, tag landing grouping, and representative search discovery.
  • PASS: Quality gate: typecheck, lint, and targeted tests pass. Local and CI checks passed.

Behavioral assertion check for passes:true stories:

  • PASS: kv-cache-concept-page-001 includes observable behavior criteria: route render, browser verification, and visible explanatory content.
  • PASS: kv-cache-concept-page-002 includes observable behavior criteria: related-doc and inline-link navigation to nearby serving and attention docs, plus browser verification.
  • PASS: kv-cache-concept-page-003 includes observable behavior criteria: canonical route resolution and search/discovery expectations, not only structural checks.

Docs writing standards:

  • PASS: 1. The page is understandable in isolation... It explains the cache generally before narrowing to serving stages and attention variants.
  • PASS: 2. The narrative body stays focused on the concept... I did not find page-meta or workflow prose in the customer-facing copy.
  • PASS: 3. The first sections explain both what the concept is and why it matters... The definition and payoff are explicit in the opening summary and first two sections.
  • PASS: 4. The title and first narrative mention use the full name before acronyms... The page establishes “Key-value cache” before “KV cache”.
  • PASS: 5. Each section has a distinct job... Definition, payoff, generation behavior, and memory tradeoff are separated cleanly.
  • PASS: 6. Mathematically heavy pages include the equations... needed... Not applicable here; the topic is explained accurately without omitted required math.
  • PASS: 7. Visually/structurally heavy pages include the best graph... needed... Not applicable here; no graph is required to understand this page slice.
  • PASS: 8. Math sections keep concise symbol-only definitions... Not applicable; no math section was added.
  • PASS: 9. Customer-facing copy contains no reader-shortcut callouts... None found.
  • PASS: 10. References and citations are present where factual claims need support... The page renders CitationList from the existing registry citations for the concept record.
  • PASS: 11. Related docs, tags, and citations support discovery... The page stands on its own while also exposing related docs and tags.
  • PASS: 12. The copy is concise, direct... The prose is compact and layperson-readable.

General website standards:

  • PASS: Architecture and dependency boundaries are respected. The slice reuses the existing docs shell, registry model, and search/runtime helpers instead of adding one-off plumbing.
  • PASS: Data flow and ownership are clear. Canonical route selection comes from published-docs registry lookup rather than ad hoc per-page logic.
  • PASS: Shared UI patterns are reused where appropriate. The page follows the established concept-page structure and shared components.
  • PASS: Loading, empty, error, and success states are intentional where relevant. No new data-backed interactive state was introduced beyond existing shared docs surfaces.
  • PASS: Accessibility, responsive behavior, localization, and browser support were considered. The built page rendered cleanly in desktop and narrow mobile viewport QA; English-only locale gating remains intact in tests.
  • PASS: Test evidence matches the risk and user impact of the change. The PR includes focused behavior checks and passed the full project gate.

Graphing standards:

  • PASS: Not applicable. This PR does not add or change a graph/chart surface.

Review rules outcome:

  • PASS: correctness before style or preference
  • PASS: solves the stated problem without obvious regression
  • PASS: architecture and dependency fit
  • PASS: readability and maintainability
  • PASS: appropriate tests and quality-check evidence
  • PASS: no hallucinated API, hidden side effect, or dead-code issue found in the touched implementation

No blocking issues remain. Merging.

@AndreasAbdi AndreasAbdi merged commit 153049b into main Jun 22, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant